Summary Statistics in Java 8

Introduction

Java 8 added three Summary Statistics classes viz., IntSummaryStatistics,  LongSummaryStatistics  and DoubleSummaryStatistics  These are state objects used for collecting statistics such as count, min, max, sum, and average. In this post we will learn about the three Summary Statistics classes in Java 8, with examples. We will also learn how to create these from Java 8 streams.

An use case to find average, minimum and maximum

First, we will see an use case where these Summary Statistics objects can be useful. Let us say we have an Employee class as shown below.

public class Employee {
    private String name;
    private String city;
    private int age;
    private double salary;

    public Employee(String name, String city, int age, double salary) {
        this.name = name;
        this.city = city;
        this.age = age;
        this.salary = salary;
    }

    public String getName() {
        return name;
    }

    public String getCity() {
        return city;
    }

    public int getAge() {
        return age;
    }

    public double getSalary() {
        return salary;
    }
}

It has a name field, the city in which the employee works in, the age and the salary of the employee. We have a list of employee objects constructed as shown below.

List<Employee> employees = List.of(
        new Employee("Doe", "Berlin", 25, 12_000),
        new Employee("John", "Sydney", 22, 9000),
        new Employee("Mary", "Berlin", 29, 13_000),
        new Employee("Mike", "London", 34, 10_500),
        new Employee("Adams", "London", 32, 11_500),
        new Employee("Harry", "Berlin", 35, 12_000));
From the list of employees, we want to find the
  • Average age of the employees.
  • Minimum and Maximum age of the employees.
In other words, we want to calculate multiple statistics from the list of employee objects. There are two ways to do this.

Streaming the list multiple times

We stream the list multiple times to find the average, minimum, and maximum.

double avgAge = employees.stream()
        .mapToInt(Employee::getAge)
        .average()
        .orElse(0);

int minAge = employees.stream()
        .mapToInt(Employee::getAge)
        .min()
        .orElse(0);

int maxAge = employees.stream()
        .mapToInt(Employee::getAge)
        .max()
        .orElse(0);

System.out.println("Average age: " + avgAge);
System.out.println("Minimum age: " + minAge);
System.out.println("Maximum age: " + maxAge);

This outputs,

Average age: 29.5
Minimum age: 22
Maximum age: 35

Using conventional ‘for’ loop

To avoid processing the list more than once, we can use a traditional for each loop to process the list.

int sumOfAge = 0;
int minimumAge = Integer.MAX_VALUE;
int maximumAge = Integer.MIN_VALUE;
for (Employee employee: employees) {
    sumOfAge += employee.getAge();
    minimumAge = Math.min(minimumAge, employee.getAge());
    maximumAge = Math.max(maximumAge, employee.getAge());
}

System.out.println("Average age: " + ((double) sumOfAge / employees.size()));
System.out.println("Minimum age: " + minimumAge);
System.out.println("Maximum age: " + maximumAge);

This will result in the same output as the first option.

Problems with these options

  • The first option is neat and functional, but we have to process the list thrice. The second option is the good old-way of doing this. But in either case, we had to do a lot of work ourselves.
  • In the future if we wanted to find the sum of a field, then we need to modify our code (in both cases) to compute the sum in addition to the average, min and max.
  • To pass these around means we have to deal with a lot of parameters and arguments.
The Summary Statistics classes were added to Java 8 to help with this and to overcome the above-mentioned problems. Let us look how we can use an IntSummaryStatistics in this case.

Summary Statistics

SummaryStatistics objectis a state object for collecting statistics such as count, min, max, sum, and average. This includes three classes:

  • IntSummaryStatistics
  • LongSummaryStatistics
  • DoubleSummaryStatistics

Using these we can compute, in a single pass, the count, minimum, maximum, sum and the average of numbers.

IntSummaryStatistics Example

Let us use IntSummaryStatistics for our use case. 

IntSummaryStatistics intSummaryStatistics = new IntSummaryStatistics();
employees.stream()
        .mapToInt(Employee::getAge)
        .forEach(intSummaryStatistics);
printIntSummaryStatistics("Int Summary Statistics", intSummaryStatistics);

We create a new IntSummaryStatistics instance. We create a stream of the employee objects and map them to an int primitive using the mapToInt method. The mapToInt method takes an ToIntFunction which maps an object to a primitive integer. Finally, we pass the IntSummaryStatistics instance to the forEach method (because an IntSummaryStatistics is an IntConsumer).

The above forEach can be written (expanded as) forEach(a -> intSummaryStatistics.accept(a)) or forEach(intSummaryStatistics::accept).
 
I have added the printIntSummaryStatistics method as a static utility method which prints the count, average, minimum, maximum and the sum of a passed IntSummaryStatistics.
public static void printIntSummaryStatistics(String message, IntSummaryStatistics intSummaryStatistics) {
    System.out.println(message);
    System.out.println("Count: " + intSummaryStatistics.getCount());
    System.out.println("Avg: " + intSummaryStatistics.getAverage());
    System.out.println("Min: " + intSummaryStatistics.getMin());
    System.out.println("Max: " + intSummaryStatistics.getMax());
    System.out.println("Sum: " + intSummaryStatistics.getSum());
    System.out.println();
}

The output of running the above code is shown below.

Int Summary Statistics
Count: 6
Avg: 29.5
Min: 22
Max: 35
Sum: 177

The advantages of using an IntSummaryStatistics should now be obvious. It has cleared all the clutter in computing the different statistics ourselves. 

Note: One thing to note here is that, internally, it will compute all the statistics field except the average (which is computed at runtime by dividing the sum by the count). Thus, for a field like age, even if we don’t want the sum of ages, it will still be computed.
 

IntSummaryStatistics, LongSummaryStatistics and DoubleSummaryStatistics

Let us look at the three Summary Statistics classes and their methods. All three (IntSummaryStatistics, LongSummaryStatistics and DoubleSummaryStatistics) exist for similar purpose (to compute in a single pass the various statistics), but each operate on different data types (int/long/double respectively).

Each of these have five methods for getting the computed statistics.

  • getCount(): Returns the count of values recorded.
  • getSum(): Returns the sum of values recorded, or zero if no values are recorded.
  • getMin(): This method returns the minimum value recorded.
  • getMax(): This method returns the maximum value recorded.
  • getAverage(): Returns the arithmetic mean of values recorded, or zero if no values are recorded.

Shown below is the different return values for getMin() and getMax() methods if no values are recorded.

IntSummaryStatistics:
  • getMin(): This method returns Integer.MAX_VALUE if no values were recorded.
  • getMax(): This method returns Integer.MIN_VALUE if no values were recorded.
LongSummaryStatistics:
  • getMin(): This method returns Long.MAX_VALUE if no values were recorded.
  • getMax(): This method returns Long.MIN_VALUE if no values were recorded.
DoubleSummaryStatistics:
  • getMin(): Returns Double.NaN if any recorded value was NaN or Double.POSITIVE_INFINITY  if no values were recorded.
  • getMax(): Returns Double.NaN if any recorded value was NaN or Double.NEGATIVE_INFINITY  if no values were recorded.

The accept and combine methods

  • IntSummaryStatistics implements an IntConsumer
  • LongSummaryStatistics implements a LongConsumer and IntConsumer
  • DoubleSummaryStatistics implements an DoubleConsumer
Hence, the three SummaryStatistics method have an accept method. The parameter type will vary based on the Consumer type they implement (int/long/double). We use this method to record a value into the summary information.
 
Each of the SummaryStatistics classes have a combine method which acts like a copy constructor. We can use this method to combine two SummaryStatistics objects (of the same type). 
 
Example: The implementation of the combine method of IntSummaryStatistics is shown below. It updates the statistics information by combining the current IntSummaryStatistics instance with the passed IntSummaryStatistics.
 
public void combine(IntSummaryStatistics other) {
    count += other.count;
    sum += other.sum;
    min = Math.min(min, other.min);
    max = Math.max(max, other.max);
}

Using Summary Statistics in Primitive Streams

By primitive streams, I meant the stream specialization for primitives like int, long and double. These are IntStream, LongStream and DoubleStream. Each of these streams types has a method called summaryStatistics() which returns an appropriate Summary Statistics type.
In other words, calling summaryStatistics() on an

  • IntStream returns a IntSummaryStatistics
  • LongStream returns a LongSummaryStatistics
  • DoubleStream returns a DoubleSummaryStatistics

SummaryStatistics on an IntStream

We are using the same employee list as used earlier. We create a stream from the list, map each employee object to a primitive int using the mapToInt method. This method returns an IntStream and we call the summaryStatistics() method on it to obtain an IntSummaryStatistics.

IntSummaryStatistics employeeAgeStatistics = employees.stream()
        .mapToInt(Employee::getAge)
        .summaryStatistics();
        
printIntSummaryStatistics("Employee Age Statistics", employeeAgeStatistics);

Printing it will give the same output as earlier.

Employee Age Statistics
Count: 6
Avg: 29.5
Min: 22
Max: 35
Sum: 177

If we wanted to use the collect method on an IntStream to build the IntSummaryStatistics, we can do like,

IntSummaryStatistics employeeAgeStatistics = employees.stream()
        .mapToInt(Employee::getAge)
        .collect(IntSummaryStatistics::new, IntSummaryStatistics::accept,
               IntSummaryStatistics::combine);

The IntSummaryStatistics::new is the mutable result container which creates a new IntSummaryStatistics. The accumulator function is the accept method of the IntSummaryStatistics. We use the combine method of the IntSummaryStatistics as the combiner to merge two IntSummaryStatistics.

Actually this is exactly the implementation of the summaryStatistics() method.

SummaryStatistics on a DoubleStream

Creating a DoubleSummaryStatistics and LongSummaryStatistics from a DoubleStream and LongStream should now be trivial. Here, I have shown examples for creating a DoubleSummaryStatistics for the employee salaries and for salaries of employees only in Berlin. In the latter case, we apply a filter predicate to filter only the employees who are located in Berlin.

DoubleSummaryStatistics employeeSalaryStatistics = employees.stream()
        .mapToDouble(Employee::getSalary)
        .summaryStatistics();
printDoubleSummaryStatistics("Employee Salary Statistics", employeeSalaryStatistics);

DoubleSummaryStatistics employeeSalaryStatisticsInBerlin = employees.stream()
        .filter(employee -> employee.getCity().equals("Berlin"))
        .mapToDouble(Employee::getSalary)
        .summaryStatistics();
printDoubleSummaryStatistics("Employee Salary Statistics for Berlin Employees", employeeSalaryStatisticsInBerlin);

public static void printDoubleSummaryStatistics(String message, DoubleSummaryStatistics doubleSummaryStatistics) {
    System.out.println(message);
    System.out.println("Count: " + doubleSummaryStatistics.getCount());
    System.out.println("Avg: " + doubleSummaryStatistics.getAverage());
    System.out.println("Min: " + doubleSummaryStatistics.getMin());
    System.out.println("Max: " + doubleSummaryStatistics.getMax());
    System.out.println("Sum: " + doubleSummaryStatistics.getSum());
    System.out.println();
}

This outputs,

Employee Salary Statistics
Count: 6
Avg: 11333.333333333334
Min: 9000.0
Max: 13000.0
Sum: 68000.0

Employee Salary Statistics for Berlin Employees
Count: 3
Avg: 12333.333333333334
Min: 12000.0
Max: 13000.0
Sum: 37000.0

SummaryStatistics on a LongStream

In the below example, we create a LongSummaryStatistics from a LongStream having numbers 1 to 100.

LongSummaryStatistics longSummaryStatistics = LongStream.range(1, 100)
        .summaryStatistics();
printLongSummaryStatistics("LongSummaryStatistics for range 1-100", longSummaryStatistics);


public static void printLongSummaryStatistics(String message, LongSummaryStatistics longSummaryStatistics) {
    System.out.println(message);
    System.out.println("Count: " + longSummaryStatistics.getCount());
    System.out.println("Avg: " + longSummaryStatistics.getAverage());
    System.out.println("Min: " + longSummaryStatistics.getMin());
    System.out.println("Max: " + longSummaryStatistics.getMax());
    System.out.println("Sum: " + longSummaryStatistics.getSum());
    System.out.println();
}
LongSummaryStatistics for range 1-100
Count: 99
Avg: 50.0
Min: 1
Max: 99
Sum: 4950

Creating SummaryStatistics using Stream.collect

This section explores into how we can create Summary Statistics when using the Stream’s collect method.

Collectors.summarizingInt

The summarizingInt is a static method in the Collectors class. It returns a Collector which applies an int-producing mapping function(ToIntFunction) to each element of the stream and returns a IntSummaryStatistics.

IntSummaryStatistics employeeAgeStatistics = employees.stream()
        .collect(Collectors.summarizingInt(Employee::getAge));
printIntSummaryStatistics("Employee Age Statistics", employeeAgeStatistics);

Collectors.summarizingLong and Collectors.summarizingDouble

Similarly, there is summarizingLong and summarizingDouble method that takes a ToDoubleFunction and a ToLongFunction respectively and returns a LongSummaryStatistics and DoubleSummaryStatistics, respectively.

DoubleSummaryStatistics employeeSalaryStatistics = employees.stream()
        .collect(Collectors.summarizingDouble(Employee::getSalary));
printDoubleSummaryStatistics("Employee Salary Statistics", employeeSalaryStatistics);
LongSummaryStatistics longSummaryStatistics = List.of(1L, 2L, 3L, 4L, 5L)
        .stream()
        .collect(Collectors.summarizingLong(Long::longValue));
printLongSummaryStatistics("LongSummaryStatistics for range 1-5", longSummaryStatistics);

The last segment’s (for LongSummaryStatistics) output is:

LongSummaryStatistics for range 1-5
Count: 5
Avg: 3.0
Min: 1
Max: 5
Sum: 15

Passing a supplier, accumulator and a combiner

If we wanted to pass a supplier, accumulator and a combiner functions to the Stream’s collect method, it would look like,

IntSummaryStatistics employeeAgeStatistics  = employees.stream()
        .collect(() -> new IntSummaryStatistics(),
                (intSummaryStatistics, employee) -> intSummaryStatistics.accept(employee.getAge()),
                (intSummaryStatistics1, intSummaryStatistics2) -> intSummaryStatistics1.combine(intSummaryStatistics2));
printIntSummaryStatistics("Employee Age Statistics", employeeAgeStatistics);
  • Lambda expression () -> new IntSummaryStatistics() is the supplier of the result container. 
  • The accumulator function calls the accept method of the IntSummaryStatistics method.
  • The combiner merges two IntSummaryStatistics using the combine method.
We can simplify this using method references as,
IntSummaryStatistics employeeAgeStatistics = employees.stream()
        .collect(IntSummaryStatistics::new,
                (intSummaryStatistics, employee) -> intSummaryStatistics.accept(employee.getAge()),
                IntSummaryStatistics::combine);
printIntSummaryStatistics("Employee Age Statistics", employeeAgeStatistics);

Conclusion

We have covered the Summary Statistics classes in Java (IntSummaryStatistics, LongSummaryStatistics, and DoubleSummaryStatistics). We learnt the usefulness of such classes and also saw examples of Summary Statistics and learnt how to use them in Java streams.

References

Leave a Reply