JMH benchmark with examples

Introduction

I earlier wrote post on the wrong way to measure execution time of code in Java and a naive benchmarking approach. I didn’t tell the right way to do it. This post does exactly that. We will see about the JMH benchmark (Java Microbenchmark Harness). 

I’m quoting the JMH definition from OpenJDK

JMH is a Java harness for building, running, and analysing nano/micro/milli/macro benchmarks written in Java and other languages targetting the JVM.

https://openjdk.java.net/projects/code-tools/jmh/

JMH is a Java harness library for writing benchmarks on the JVM, and it was developed as part of the OpenJDK project. JMH provides a very solid foundation for writing and running benchmarks whose results are not erroneous due to unwanted virtual machine optimizations.

Creating a JMH project using Maven

We can use the Maven generate archetype to create a JMH project

mvn archetype:generate \
    -DinteractiveMode=false \
    -DarchetypeGroupId=org.openjdk.jmh \
    -DarchetypeArtifactId=jmh-java-benchmark-archetype \
    -DgroupId=com.javadevcentral.jmh.demo \
    -DartifactId=javadevcentral-jmh-demo \
    -Dversion=1.0-SNAPSHOT

The archetypeGroupId must be org.openjdk.jmh and archetypeArtifactId as jmh-java-benchmark-archetype. Name the artifactId to be the name of the project you want and groupId to uniquely identify the project artifact.

This will create a folder named with the artifactId. The projectId will be the java package name (com.javadevcentral here). You can open this project in your IDE of choice or can even use plain vi editor for writing the benchmarks.

Anatomy of a JMH benchmark

Let us write a hello-world benchmark and we will learn the structure of a JMH benchmark code.

package com.javadevcentral.jmh.demo;

import org.openjdk.jmh.annotations.Benchmark;
import org.openjdk.jmh.annotations.BenchmarkMode;
import org.openjdk.jmh.annotations.Mode;
import org.openjdk.jmh.annotations.OutputTimeUnit;

import java.util.concurrent.TimeUnit;

@BenchmarkMode(Mode.Throughput)
@OutputTimeUnit(TimeUnit.MILLISECONDS)
public class HelloWorldBenchmark {

    @Benchmark
    public int helloWorldBenchmark() {
        return 1;
    }
}

This shows a basic benchmark code though it does nothing useful. It just returns a constant. The method to benchmark is annotated using @Benchmark annotation.

To run the benchmark, run mvn install command on the project root directory. This will create HelloWorldBenchmark.class inside target folder under the right java package namespace. We can run the benchmark as

java -jar target/benchmarks.jar HelloWorldBenchmark

This will start executing all the benchmark methods in the class passed as an argument to the benchmarks.jar. We did not specify values for the number of benchmark runs or warmups. With the default values, it runs ten trials, twenty warmups and twenty iterations in each trial (I will explain what these mean shortly). The benchmark output looks like

# Warmup: 20 iterations, 1 s each
# Measurement: 20 iterations, 1 s each
# Threads: 1 thread, will synchronize iterations
# Benchmark mode: Throughput, ops/time
# Benchmark: com.javadevcentral.jmh.demo.HelloWorldBenchmark


# Run progress: 0.00% complete, ETA 00:06:40
# Fork: 1 of 10
# Warmup Iteration   1: 1537143.550 ops/ms
# Warmup Iteration   2: 1543728.128 ops/ms
....
# Warmup Iteration  20: 1333021.373 ops/ms
Iteration   1: 1409472.009 ops/ms
Iteration   2: 1546378.320 ops/ms
....
Iteration  20: 1449046.725 ops/ms

....
....
# Run progress: 10.00% complete, ETA 00:07:24
# Fork: 10 of 10
# Warmup Iteration   1: 1423934.529 ops/ms
# Warmup Iteration   2: 1504157.585 ops/ms
....
# Warmup Iteration  20: 1543213.052 ops/ms
Iteration   1: 1491889.932 ops/ms
Iteration   2: 1541406.700 ops/ms
....
Iteration  20: 1530902.449 ops/ms

Result: 1468664.277 ±(99.9%) 40196.447 ops/ms [Average]
  Statistics: (min, avg, max) = (481536.386, 1468664.277, 1627095.833), stdev = 170194.274
  Confidence interval (99.9%): [1428467.830, 1508860.724]


# Run complete. Total time: 00:08:06

Benchmark                                         Mode  Samples        Score  Score error   Units
c.j.j.d.HelloWorldBenchmark.helloWorldBenchmark    thrpt      200  1468664.277    40196.447  ops/ms

After completing the benchmark, it shows the result at the bottom. It shoes the average, minimum and maximum values of the result. Here, our code ran 1468664 times per millisecond on average.

 
Note: You can name the benchmark method as you like.

Terminology – Trial, Warmup and Iteration

Trial:
The JMH benchmark is run for a number of trials. Trials are also called as forks.

Warmup:
For each fork, a number of iterations are configured as warmups. This is to get the JVM to warmup the code we are measuring. This is important to avoid fluctuations or variations in the runtime once we start the actual iterations.

 
Iteration:
This is the actual benchmark code execution/iteration. The performance numbers from this will be output as the JMH benchmark result.
 
Each warmup iteration and measurement iteration is executed for a certain time. 
 
The below picture helps to visualize this. A JMH fork consists of a set of warmups (w) and a set of measurements (m). The x-axis is the iteration number and y-axis is the time.
Java microbenchmark harness Terminology

Annotations

We will see the various annotations available when writing a benchmark using JMH.

BenchmarkMode

This specifies the mode in which the benchmark is to be run. It takes an array of values. Possible values are:

Throughput:(default mode) To measure the throughout of a piece of code. This is used to measure the number of times a method is executed in a certain time. Use this when the method takes only a few milliseconds.
AverageTime: This is to get the average time the method takes to execute.
SampleTime: Sampled time for each operation. Shows p50, p90, p99, min and max times.
SingleShotTime:  This measures the time for a single operation. Use this when you want to account for the cold start time also.
All: Measures all of the above.

The most common mode used are Throughput and AverageTime.

OutputTimeUnit

This is the default time unit in which the results will be listed in. It uses Java’s TimeUnit class and hence can specify any of the values from the TimeUnit enum.

Benchmark

Annotate the benchmark code or method with @Benchmark. The method to benchmark must follow the below rules

  • Must be public
  • The arguments may be one of the JMH’s StateControl or Blackhole classes.

We will see about State and Blackhole classes in this post. The second point means that the benchmark method cannot take parameters of any other type – it has to be one of the three mentioned.

Fork

This is used to tell the number of trials or forks. The value field is the number of forks. It has a warmup field that configures the number of forks that must be treated as warmups. 

Example: @Fork(value = 5, warmups = 2)
We here have 5 forks and among them 2 entire forks will be warmups.

Measurement

It is used to set the default measurement parameters for the benchmark. It allows to specify the number of iterations and the time for which each is to be executed.

Example: @Measurement(iterations = 3, time = 1000, timeUnit = TimeUnit.MILLISECONDS)
We here specify 3 iterations each to be run for 1000 millisecond (1 second). The default time unit is seconds.

Warmup

Warmup parameters are similar to that of Measurement except that it applies for the warmup runs.

Example: @Warmup(iterations = 3, time = 2)
This dedicates 3 full iterations as warmup each running for 2 seconds.

A real JMH benchmark

Having seen the theory and terminology of a JMH project and JMH benchmark, let us benchmark code and see the results. In Java, String concatenation is costly as it creates new String instances when concatenation as Strings are immutable. When we have to concatenate strings in a loop, we must use a StringBuilder. Let us write two code snippets – one that joins strings and one that uses a StringBuilder. Then, we will benchmark both to get the running time.

package com.javadevcentral.jmh.demo;

import org.openjdk.jmh.annotations.Benchmark;
import org.openjdk.jmh.annotations.BenchmarkMode;
import org.openjdk.jmh.annotations.Fork;
import org.openjdk.jmh.annotations.Measurement;
import org.openjdk.jmh.annotations.Mode;
import org.openjdk.jmh.annotations.OutputTimeUnit;
import org.openjdk.jmh.annotations.Warmup;

import java.util.concurrent.TimeUnit;

@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.MILLISECONDS)
public class StringJoinBenchmark {

    @Benchmark
    @Fork(value = 2)
    @Measurement(iterations = 10, time = 1)
    @Warmup(iterations = 5, time = 1)
    public String stringConcat() {
        String result = "";
        for (int i = 0; i < 1000; i++) {
            result += String.valueOf(i);
        }
        return result;

    }

    @Benchmark
    @Fork(value = 2)
    @Measurement(iterations = 10, time = 1)
    @Warmup(iterations = 5, time = 1)
    public String concatUsingStringBuilder() {
        StringBuilder result = new StringBuilder();
        for (int i = 0; i < 1000; i++) {
            result.append(i);
        }
        return result.toString();

    }
}

We benchmark the two methods with 2 forks, 5 iterations of warmup and 10 actual iterations each running for 1 second. We run the benchmark in AverageTime mode to get the average running time of the method. The results for stringConcat is as follows.

For concatUsingStringBuilder:

Result: 0.017 ±(99.9%) 0.001 ms/op [Average]
  Statistics: (min, avg, max) = (0.015, 0.017, 0.019), stdev = 0.001
  Confidence interval (99.9%): [0.016, 0.018]

For stringConcat:

Result: 1.697 ±(99.9%) 0.117 ms/op [Average]
  Statistics: (min, avg, max) = (1.529, 1.697, 2.000), stdev = 0.135
  Confidence interval (99.9%): [1.580, 1.813]

Summary is 

Benchmark                                             Mode  Samples  Score  Score error  Units
c.j.j.d.StringJoinBenchmark.concatUsingStringBuilder    avgt       20  0.017        0.001  ms/op
c.j.j.d.StringJoinBenchmark.stringConcat                avgt       20  1.697        0.117  ms/op

The stringConcat method takes 1.6 milliseconds whereas concatUsingStringBuilder takes only 0.017 milliseconds which is 100 times faster.

JMH State objects

As I said earlier, the benchmark method can only take parameters of type StateControl or Blackhole. We cannot pass arbitrary parameters. We can pass arbitrary data using State annotation.

State is a class-level annotation. We can pass any class annotated with @State as an argument to the benchmark method.

State Scope

The State annotation accepts the scope field. The value for this indicates the scope of the class annotated with @Scope. The values can be one of the following:
Benchmark: All benchmark threads will share the same object.
Group: The object is shared between all threads of a thread group.
Thread: The state object is thread specific (the thread running the benchmark).

Setup and Teardown

We can have two methods annotated with @Setup and @Teardown. The Setup method runs to set up the state object. The JMH runs the Teardown method to clean up the resources (if any) of the State object.

Setup and Teardown Levels

The Setup and Teardown annotations accept a level. It indicates the level at which the Setup and Teardown will be executed. There are three levels:
Trial: Called once per trial (fork).
Iteration: The method is called once per iteration (including warmup).
Invocation: Called per invocation of the method (use it with care as it can affect the benchmark results if not careful of what we do in the Setup/Teardown logic).

Example of using State object

@BenchmarkMode(Mode.Throughput)
@OutputTimeUnit(TimeUnit.MILLISECONDS)
public class PassingStateBenchmark {

    @State(Scope.Thread)
    public static class MyState {
        String a, b;

        @Setup(Level.Iteration)
        public void setup() {
            a = "some-val";
            b = "some-val2";
        }

        @TearDown(Level.Iteration)
        public void teardown() {
            a = b = "";
        }
    }
    @Benchmark
    public void benchmark(MyState myState) {
        //do whatever with myState
        String res = myState.a + myState.b;
        //....
    }
}

Comparing with our naive benchmarking framework

Let us look back at the problems from the Naive Benchmarking Framework we developed. 

#1 – Isolating runs

We learnt that we should not run multiple bechmarks in the same JVM. Using JMH we need not worry about it. Each benchmark method goes through multiple trials/forks with warmup and actual runs. The problem we had of variations in measurements is no longer a problem.

#2 – Dead code

The JVM can perform dead-code elimination and can avoid computing a value that is unused afterwards i.e., if we do not use a value after writing into it, the JVM can avoid the computation altogether. If our entire benchmark is on measuring the time to compute it, the result will be entirely wrong.

But, we have not entirely gotten around the dead code problem entirely.

If our benchmark method computes one value or one result (like in our string join method), it is a must that we return it. Or else, the JVM can avoid the computation since it is not being used. This will produce a wrong JMH benchmark result.

So, it is imperative that we return the result of the computation. But what if the benchmark code produces multiple values?

Returning multiple values

If we want to return more than one value from a benchmark, we can use the Blackhole class. Simply add blackhole to the parameter of the benchmark method. Then, for each value to return call the consume method on the Blackhole object.

@Benchmark
public void benchmark(MyState myState, Blackhole blackhole) {
    String res = myState.a + myState.b;
    //..
    blackhole.consume(res);
    //perform computation 2
    blackhole.consume(res);
}

#3 – Constant folding

The last problem in our naive benchmark was constant folding. When using hard coded values in our code, the JVM detects that the same parameter values are being used again and again and hence will not do the computation all the time. Instead, it applies constant folding assigning the benchmark result variable a constant value (the result of the computation).

This can happen if we have a bunch of constants in the benchmark method using which we compute something. To avoid this, we can use State classes we saw earlier. Move all the constant parameters into the state class and use it as a parameter. In the benchmark method use the values from the State object to do the computation.

Conclusion

We learnt what a JMH is and how to write and execute one. First, we saw how to set up a JMH project using Maven. Second, we learnt the anatomy of a JMH benchmark harness. Third, we looked at the terminology and the annotations of the JMH. Next, we compared the runtime performance of String concatenation using a StringBuilder and concatenating in a loop using +. We also looked at how to pass arguments using the State class.

At last, we looked back on the problems or the optimizations did by the JVM in our naive benchmarking approach and revisited them in the light of JMH and how to overcome it.

Try to use JMH for your project to measure the efficiency of your code. Let me know your comments or thoughts on this.

References

How do I write a correct micro-benchmark in Java?

JMH tutorial from Jenkov

https://www.oracle.com/technical-resources/articles/java/architect-benchmarking.html

This Post Has 2 Comments

  1. Chen

    Thanks for the detailed explanation here!
    One question: why do we need to set up the “time” parameter in @Measurement? Shouldn’t we wait for the benchmark method to finish by itself and start the next iteration?
    What does it mean if we set the time = 1s? Does it mean if the method does not finish in 1s we will force it to finish?

    1. admin

      Good question Chen.

      It is the time we want each iteration to take (in the JMH Fork image, it is the y-axis value).

      For example: If your method takes 100 milliseconds and if we specify @Measurement’s time parameter as 1 second, then each iteration will execute your method 10 times (roughly).

      If we specify a time value that is lesser than the time it takes for a single execution, then it will not force exit. It will anyway complete the execution. In other words, the time is like an upper bound and it (JMH framework) checks if we are within the bound before beginning each run/execution.

Leave a Reply