Collectors filtering, flatMapping and teeing

Overview

Collectors are implementation of Collector which does various reduction operations like accumulating elements into collections (e.g., array or list), summarizing elements according to various criteria etc., Having already seen the Collectors toMap and Collectors groupingBy, in this post, we will learn about the new methods added to the Java Collectors from Java 9 to 11 (Collectors filtering, flatMapping and teeing).

Example setup

We will use the below Staff class for demonstration of the new Collector methods.

public class Staff {
        private long id;
        private String name;
        private int age;
        private String department;
        private long salary;
        private Set<String> courses;
        

        Staff(long id, String name, int age, String department, long salary, Set<String> courses) {
            this.id = id;
            this.name = name;
            this.age = age;
            this.department = department;
            this.salary = salary;
            this.courses = courses;
        }

        public long getId() {
            return id;
        }

        public String getName() {
            return name;
        }

        public Integer getAge() {
            return age;
        }

        public String getDepartment() {
            return department;
        }

        public long getSalary() {
            return salary;
        }

        public Set<String> getCourses() {
            return courses;
        }

        @Override
        public String toString() {
            return "id = " + id
                    + ", name = " + name;
        }
    }

A Staff has a name, an id, an age, the department to which the employee belongs to, the salary and the list of courses the employee teaches. This assumes that a Staff belongs to only one department. (Note that the salary numbers are just arbitrary).

Building list of Staff objects

For this post, I’ll use the below shown list of staff objects.

List<Staff> staffs = new ArrayList<>();
staffs.add(new Staff(1L, "A", 32, "CS", 5000, Set.of("Computer Architecture", "DS", "Algorithms")));
staffs.add(new Staff(2L, "B", 25, "Math", 2800, Set.of("Discrete Maths")));
staffs.add(new Staff(3L, "C", 29, "CS", 4500, Set.of("DS", "Algorithms")));
staffs.add(new Staff(4L, "D", 37, "Science", 2500, Set.of("Quantum Physics", "Thermodynamics")));
staffs.add(new Staff(5L, "E", 41, "Math", 5900, Set.of("Discrete Maths", "Probability")));

Grouping by a classifier and filtering

We have a list of Staff objects. We want to create a mapping of the department name to the list of salaries of the employees (staffs) in that department.

Let us first use the Collectors.groupingBy for this for grouping the staffs by their department.

Map<String, List<Staff>> departmentToStaffs = staffs.stream()
        .collect(Collectors.groupingBy(Staff::getDepartment, Collectors.toList()));

The above groups the list of staffs using the classifier function. Here, the classifier function groups by the department name. The downstream collector just collects the staff objects as a list.

Since we wanted only the salary of a staff and not the actual Staff object, we can use Collectors.mapping as the downstream collector.

Map<String, Set<Long>> departmentToSalaries = staffs.stream()
        .collect(Collectors.groupingBy(Staff::getDepartment,
                Collectors.mapping(Staff::getSalary, Collectors.toList())));
System.out.println(departmentToSalaries);

The Collectors mapping is a downstream collector that maps a Staff to their salary and Collectors.toList collects the salaries as a list. This will print,

{CS=[4500, 5000], Science=[2500], Math=[2800, 5900]}

Now, let us say we want the mapping from department to list of salaries for only the salaries that are greater than 2800.

We can add a filter to our stream to filter out the Staffs whose salary is less than 2800.

Map<String, Set<Long>> departmentToSalaries = staffs.stream()
        .filter(staff -> staff.getSalary() > 2800)
        .collect(Collectors.groupingBy(Staff::getDepartment,
                Collectors.mapping(Staff::getSalary, Collectors.toList())));
System.out.println(departmentToSalaries);

This will print,

{CS=[5000, 4500], Math=[5900]}

But there is a problem here. The Science department does not appear in the final result. This is because all the Staffs in the Science department were filtered out by our condition (this example had only one Staff in the Science department).

Filtering Collector - Collectors.filtering

Added in: Java 9

We can use the Filtering Collector (Collectors.filtering method) to overcome the above problem. We use the filtering collectors in multi-level reduction as a downstream collector of a groupingBy or partitioningBy.

It adapts a Collector by applying a predicate to each element in the steam and it only accumulates if the predicate returns true.

Thus, in our example, even if there are no Staffs in a department whose salary is above the threshold, the filtering collector will still create an empty mapping for that department (value would be an empty list). But as we saw earlier, using stream’s filter resulted in a missing department in our resultant mapping. This is how a filtering collector differs from a stream’s filter operation.

Collectors.filtering method signature

public static <T, A, R>
Collector<T, ?, R> filtering(Predicate<? super T> predicate,
                             Collector<? super T, A, R> downstream)

The filtering method accepts a Predicate as the first argument and a downstream collector as its second argument. It filters the elements by the passed predicate and it uses the downstream collector only for those elements that passed the predicate function (i.e., the elements for which the predicate returned true).

Using Collectors.filtering as a downstream collector

Let us use the Collectors.filtering to create the mapping of department name to a list of salaries, but filtering the salaries that are less than 2800.

Map<String, List<Long>> departmentToSalaries = staffs.stream()
        .collect(Collectors.groupingBy(Staff::getDepartment,
                Collectors.filtering(staff -> staff.getSalary() > 2800,
                        Collectors.mapping(Staff::getSalary, Collectors.toList()))));
System.out.println(departmentToSalaries);

First, we use a groupingBy Collector to group by the department name. Then we use a filtering Collector as its downstream collector. We pass the predicate function, staff -> staff.getSalary() > 2800, to filter the staffs whose salaries are less than 2800. The downstream collector of the filtering collector is the mapping Collector (this is the same as used in the previous example). It maps each Staff to their salary and collects it as a list. Running the above code would print,

{CS=[5000, 4500], Science=[], Math=[5900]}

As you can see, there is an empty mapping for the Science department.

Grouping and mapping a value to more than one value

In this example, let us say we want to create a mapping of department name to the set of courses taught in that department.

Let us start with groupingBy and mapping Collectors and see what we get.

Map<String, Set<Set<String>>> departmentToCoursesSet = staffs.stream()
        .collect(Collectors.groupingBy(Staff::getDepartment,
                Collectors.mapping(Staff::getCourses, Collectors.toSet())));
System.out.println(departmentToCoursesSet);

We collect by the department name and use a mapping collector to map a Staff object to get the list of courses and collect it as a set. This prints the following:

{CS=[[Algorithms, DS], [Algorithms, Computer Architecture, DS]], Science=[[Thermodynamics, Quantum Physics]], Math=[[Probability, Discrete Maths], [Discrete Maths]]}

It has duplicates as we have collected it as Set<Set<String>>. We want the result to be a Set<String>.

FlatMapping Collector - Collectors.flatMapping

Added in: Java 9

We use a flatmapping Collector (like a filtering collector) in a multi-level reduction and has the following signature.

public static <T, U, A, R>
Collector<T, ?, R> flatMapping(
    Function<? super T, ? extends Stream<? extends U>> mapper,
    Collector<? super U, A, R> downstream)

It accepts a Function that maps an element of type T to a stream of elements of Type U. It applies the flatmapping function to each element of the stream. The resulting elements of the stream (of type U) are passed on to the downstream collector. Each mapped stream obtained as a result of the mapper function will be closed.

Using Collectors.flatMapping as a downstream collector

Let us use Collectors.flatMapping to collect the set of courses belonging to a department.

Map<String, Set<String>> departmentToCourses = staffs.stream()
        .collect(Collectors.groupingBy(Staff::getDepartment,
                Collectors.flatMapping(staff -> staff.getCourses().stream(),
                        Collectors.toSet())));
System.out.println(departmentToCourses);

The first groupingBy groups the stream of Staffs using the department name. We pass to the flatmapping function a mapper that maps a Staff object to a stream of courses (staff -> staff.getCourses().stream()). The downstream collector of the flatmapping collector just collects each element of the above stream in a set.

Running the above code snippet prints:

{CS=[Algorithms, Computer Architecture, DS], Science=[Thermodynamics, Quantum Physics], Math=[Probability, Discrete Maths]}

Applying two groupingBy functions

We create a new POJO called the DepartmentDetails

public class DepartmentDetails {
        private String departmentName;
        private double averageSalary;
        private double averageAgeOfEmployee;

        private DepartmentDetails(String departmentName, double averageSalary, double averageAgeOfEmployee) {
            this.departmentName = departmentName;
            this.averageSalary = averageSalary;
            this.averageAgeOfEmployee = averageAgeOfEmployee;
        }

        @Override
        public String toString() {
            return "Department = " + departmentName
                    + ", Average age of employee = " + averageAgeOfEmployee
                    + ", Average salary = " + averageSalary;
        }
    }

A department detail object has the department name, the average salary of the staffs in the department, and the average age of the staffs in that department.

The usual way to compute this is to first find the mapping of department to the average salary and the mapping of department to the average age of the employees/staffs in it. Using these two maps we can build a list of DepartmentDetails.

// Find mapping of department to average salary
Map<String, Double> departmentToAverageSalaryMap = staffs.stream()
        .collect(Collectors.groupingBy(Staff::getDepartment,
                Collectors.averagingLong(Staff::getSalary)));

// Find mapping of department to average age of staffs in the department
Map<String, Double> departmentToAverageAgeMap = staffs.stream()
        .collect(Collectors.groupingBy(Staff::getDepartment,
                Collectors.averagingLong(Staff::getAge)));

// Using the above two mappings construct the DepartmentDetails
List<DepartmentDetails> departmentDetails = departmentToAverageSalaryMap.entrySet()
                .stream()
                .map(departmentToAverageSalary -> new DepartmentDetails(departmentToAverageSalary.getKey(), departmentToAverageSalary.getValue(),
                                departmentToAverageAgeMap.get(departmentToAverageSalary.getKey())))
                .collect(Collectors.toList());
System.out.println(departmentDetails);

After computing the two independent mappings, we start with the entrySet of the first mapping (department to average salary map). For each entry in the mapping, we get the average age from the second mapping (note that the number of entries in both the mappings will be the same). Using these, we construct a DepartmentDetail object and collect them as a list.

We had to stream the original list two times to build two mappings. Can we do better?

Collectors.teeing

Added in: Java 11

With Collectors.teeing, we can combine the results of two downstream collectors. Its signature is as follows:

public static <T, R1, R2, R>
Collector<T, ?, R> teeing(Collector<? super T, ?, R1> downstream1,
                          Collector<? super T, ?, R2> downstream2,
                          BiFunction<? super R1, ? super R2, R> merger)

The Collectors.teeing returns a collector that is a composite of two downstream collectors. It applies both the downstream collectors for each element in the stream (independently) and the result of each collector is merged using the merge function (a bi-function).

departmentDetails = staffs.stream()
        .collect(Collectors.teeing(
                Collectors.groupingBy(Staff::getDepartment, Collectors.averagingLong(Staff::getSalary)),
                Collectors.groupingBy(Staff::getDepartment, Collectors.averagingLong(Staff::getAge)),
                (map1, map2) -> map1.entrySet().stream()
                        .map(departmentToAverage -> new DepartmentDetails(departmentToAverage.getKey(), departmentToAverage.getValue(),
                                map2.get(departmentToAverage.getKey())))
                        .collect(Collectors.toList())));

System.out.println(departmentDetails);

The two downstream collectors used in the two mappings used earlier are passed as the first two arguments to the teeing method. The third argument is the same merging logic used earlier.

The result is shown below:

[Department = CS, Average age = 30.5, Average salary = 4750.0, Department = Science, Average age = 37.0, Average salary = 2500.0, Department = Math, Average age = 33.0, Average salary = 4350.0]

Conclusion

In this post, we learnt about the new collector methods added to Java 9 to 11 (Collectors filtering, flatMapping and teeing methods). We learnt the usage of the Collector methods filtering, flatMapping and teeing with examples

Overview#

Example setup#

Building list of Staff objects#

Grouping by a classifier and filtering#

Filtering Collector - Collectors.filtering#

Collectors.filtering method signature#

Using Collectors.filtering as a downstream collector#

Grouping and mapping a value to more than one value#

FlatMapping Collector - Collectors.flatMapping#

Using Collectors.flatMapping as a downstream collector#

Applying two groupingBy functions#

Collectors.teeing#

Conclusion#

References#