Introduction
Java 8 introduced Streams to the JDK. This enabled us to perform functional style operations on a stream of elements. We can create a stream from any Collection. The biggest feature of streams is that they are lazy. Until a terminal operation is invoked, the preceding steps in the stream pipeline will not be invoked. There are also short-circuiting operations that we can perform on a stream like findFirst and findAny. With this we can get a result from an infinite stream. But there is a scenario where streams do not behave lazy (when using Java 8 streams flatMap). Without directly diving into it, I will start with an example and by a refactoring, I will demonstrate the unexpected behaviour of streams that you need to be aware of.
A very quick tour of Java 8 streams
Before we start, for those not aware of Java 8 streams, I’ll explain the two most important parts that you need to be aware of viz., laziness and the ability to short-circuit. I’ll touch upon the stateful and stateless nature of the streams too. If you know these, feel free to skip to the next section.
Laziness
A stream pipeline has a source (typically originating from a collection), a list of intermediate steps and a terminal step. As I alluded in the introduction, a stream pipeline once constructed is lazy. It will not perform any of the intermediate operations until you invoke a terminal operation on it.
Key point: Intermediate operations are always lazy.
Let us say we have a list of integers and we create a new stream with a mapper that transforms each integer by adding 5 to it.
Stream<Integer> stream = integers.stream()
.map(num -> num + 5);
The above code does not do the traversal or iterating through the input integers. It happens only when we add a terminal operation.
Let us add an intermediate operation to filter the numbers greater than 100. Then, we will count the number of integers that satisfy that predicate (count is a termination operation)
long result = stream.filter(num -> num > 100)
.count();
System.out.println(result);
Short circuiting
This is one of the most beautiful parts of the streams. The processing pipeline stops when the desired result is reached. In other words, it short-circuits. The desired result depends on what terminal operation we are using.
From a list of integers, to find the first number greater than 100, we can do like
Optional<Integer> firstNumGreaterThan100 = integers.stream()
.filter(num -> num > 100)
.findFirst();
This behaves like a traditional for loop that breaks once it finds a number that satisfies the predicate (num > 100). The rest of the numbers will not be operated upon.
This is useful for performance reasons. Say we have a million items and we want to find the first (or any) item that match a condition. Short-circuiting does that.
But this is most useful when dealing with infinite streams. The source of the stream is infinite i.e., it keeps on generating items indefinitely. In such a case, the only way to get a result and break out of the stream pipeline computation is to use a terminal operation.
Stream.iterate(0, i -> i + 1)
.filter(n -> n > 100)
.findFirst()
.ifPresent(System.out::println);
The above code generates a stream of numbers starting with 0, in increments of 1. Thus, we break and print when we encounter the first number greater than 100 i.e., 101. This example may seem trivial, but is of immense help in real applications dealing with a constant stream of (infinite) data.
Stateful and stateless
The last part I want to touch upon is statefulness in a stream pipeline. The map, filter operations are stateless. They do not depend upon the previous item seen when processing an element. They solely operate on one item at a time. But there are stateful operations like sorted and distinct. The former sorts the elements in the stream (by the natural ordering of the items in the stream or by using a passed Comparator). The latter removes duplicate items in the stream. Thus, it is necessary for such stateful operations to see the entire stream for their operation. In other words, they cannot produce a result before they’ve seen the entire stream. Hence, they cannot be used on an infinite stream source. It also makes no sense to sort an infinite stream or to get only the distinct elements from such a stream.
GameObject example
We are back to the core of this post. Let us say, we have a list of GameObjects. Each GameObject has a name, weight and a color (default is white). We have a list of GameObjects and want to get the first object whose weight is more than 20.
The code for this looks like
public class GameObject {
private String name;
private int weight;
private String color;
public GameObject(String name, int weight) {
this(name, weight, "white");
}
public GameObject(String name, int weight, String color) {
this.name = name;
this.weight = weight;
this.color = color;
}
public String getName() {
return name;
}
public int getWeight() {
return weight;
}
public String getColor() {
return color;
}
}
List<GameObject> gameObjects = new ArrayList<>();
gameObjects.add(new GameObject("marble", 10));
gameObjects.add(new GameObject("ball", 20));
gameObjects.add(new GameObject("board", 30));
gameObjects.add(new GameObject("coin", 40));
gameObjects.stream()
.peek(gameObject -> System.out.println("Object: " + gameObject.getName()))
.filter(gameObject -> gameObject.getWeight() > 20)
.findFirst()
.ifPresent(gameObject -> System.out.println("Result object " + gameObject.getName()));
I have added a peek to the stream. A peek is for debugging and it helps to see what elements go past the stream at that point.
Since the board object is the first whose weight is more than 20, the stream will terminate and it will not process the coin object. You can see this from what the peek prints. The output for the above snippet will be,
Object: marble
Object: ball
Object: board
Result object board
Pretty slick!!
Mapping to multiple color objects
Let us take a GameObject and generate multiple game objects with different colors. We will map one game object to three game objects of colors blue, green and red. Since, we are mapping one element in the stream to multiple elements, we will use flatMap.
private static List<GameObject> coloredObjects(GameObject gameObject) {
List<GameObject> coloredGameObjects = new ArrayList<>();
String name = gameObject.getName();
int weight = gameObject.getWeight();
coloredGameObjects.add(new GameObject(name, weight, "blue"));
coloredGameObjects.add(new GameObject(name, weight, "green"));
coloredGameObjects.add(new GameObject(name, weight, "red"));
return coloredGameObjects;
}
gameObjects.stream()
.flatMap(gameObject -> coloredObjects(gameObject).stream())
.peek(gameObject -> System.out.println("Object:" + gameObject.getColor()
+ " " + gameObject.getName()))
.filter(gameObject -> gameObject.getWeight() > 20)
.findFirst()
.ifPresent(gameObject -> System.out.println("Result object " +
gameObject.getColor() + " " + gameObject.getName()));
Running it on Oracle JDK 8, the output looks like,
Object:blue marble
Object:green marble
Object:red marble
Object:blue ball
Object:green ball
Object:red ball
Object:blue board
Object:green board
Object:red board
Result object blue board
The marble and the ball objects do not satisfy the condition. Hence, we see three prints for those two objects from line 1 to 6.
What is interesting is the next three lines. The GameObject for blue-board will be the first object that is of weight more than 30. With this, our stream pipeline should have terminated, but it didn’t. It still printed the board objects of color green and red before printing the result (blue-board).
Java 8 Streams flatMap is not lazy!!
It turns out to be a bug in Java 8 where flatMap is not lazy and does not play well with the short-circuiting. It is true that the stream pipeline is lazy and does short-circuit for the terminal operation like findFirst. Each stage or step in the pipeline (like filter, map) creates a new Stream. Flatmap is not lazy and does not adhere with the terminal step’s short-circuiting nature. As a flatMap maps one element to n elements, it creates all the n elements irrespective of what the nature of the next step is. This is the reason why we saw the green and red board object printed by the peek.
Imagine what if our flatMap step generated 1 million colored objects? It would have affected the performance, because it has to still generate those 1 million object for each game object even though we need just the first object whose weight is more than 30.
Relevant links to JDK bug database
You can find the JDK bug filed for this here - Stream.flatMap() causes breaking of short-circuiting of terminal operations. It has been backported to OpenJDK 8u222 (JDK-8225328)
This fix is part of Java 10. Running the above (2nd snippet) in Java 10 produces the expected result. It terminates when it sees the first color (blue) of the board object.
Object:blue marble
Object:green marble
Object:red marble
Object:blue ball
Object:green ball
Object:red ball
Object:blue board
Result object blue board
Conclusion
We learnt how a Streams pipeline evaluates lazily and offers short-circuiting terminal operations. We then started with a simple stream pipeline that used findFirst to get the first item satisfying a condition. Then, by using flatMap intermediate operation, we saw how it broke the laziness and short-circuiting and learnt that Java 8 streams flatMap is not lazy.
If you are using the non-backported version of Java 8, you need to be aware of this behaviour in Java 8 streams flatmap.
References
- Check out the Stream operations and pipelines section of the Java 8 Streams package specification.
- Stackoverflow Questions
- https://blog.jooq.org/are-java-8-streams-truly-lazy-not-completely
Here are some of the other posts related to Java streams
You can see all the stream related post under the java stream tag