Measuring Code Execution Time - The Wrong Way

Introduction

We have all wanted to know how fast a piece of code or algorithm runs. We want to calculate the running time of a code either to know its efficiency or to compare two or more algorithms to choose the best one. This post is part of a series of posts that explains the pitfalls of the usual ways in measuring running time of a piece of code in Java, why it is a wrong way to measure execution/running time and the correct way to measure the execution time of a code in Java.

Measuring execution time

You have written a code that does something and want to measure the execution time of that code to find its efficiency. You might have two or more different code that does the same job. So, you measure the running time of each code to decide which one is the best. This process of finding the execution time is called benchmarking.

We have the code for insertion sort and we would like to measure its execution time. Insertion sort is one of the sorting algorithms with an average and worst case runtime of O(n^2). Its best case is O(n) and that happens when the input is already sorted. The worst case, O(n^2), is when the input is reverse sorted.

We also have code for quick sort. Its running time is O(n log n) (best, worst and average).

For the rest of the discussion, we would sort the array in ascending order and reverse order means descending order.

The sorting algorithms

The code for the insertion and quick sort are placed in a class as static methods.

public class Sorts {
    public static void insertionSort(int[] arr) {
        int n = arr.length;
        for (int i = 1; i < n; i++) {
            int key = arr[i];
            int j = i - 1;

            while (j >= 0 && arr[j] > key) {
                arr[j + 1] = arr[j];
                j = j - 1;
            }
            arr[j + 1] = key;
        }
    }

    private static int partition(int[] arr, int low, int high) {
        int pivot = arr[high];
        int i = (low - 1);
        for (int j = low; j < high; j++) {
            if (arr[j] <= pivot) {
                i++;
                int temp = arr[i];
                arr[i] = arr[j];
                arr[j] = temp;
            }
        }
        int temp = arr[i + 1];
        arr[i + 1] = arr[high];
        arr[high] = temp;

        return i + 1;
    }


    private static void quickSort(int[] arr, int low, int high) {
        if (low < high) {
            int p = partition(arr, low, high);

            quickSort(arr, low, p - 1);
            quickSort(arr, p + 1, high);
        }
    }
    public static void quickSort(int[] arr) {
        quickSort(arr, 0, arr.length - 1);
    }
}

Very naive way to benchmark code

The normal way, which I call the very naive way, is to measure the running time is to use a clock based approach. We would measure the (clock) time at the start and the time when the method/code run has completed and subtract the end time from the start. Typically, we use System.currentTimeMillis for this. But using System.nanoTime is more accurate than currentTimeMillis.

A typical benchmark code looks like,

long start = System.nanoTime();
Sorts.insertionSort(input);
long end = System.nanoTime();
System.out.println("Took (in ns)" + (end - start));

This gives the time it took in nano seconds. To get the time in milliseconds, we have to divide the result of the difference by 100_000 (10^6).

The rest of the post talks about the problems in the approach and why it must not be done to get accurate results.

We start off by constructing a list of random numbers. Next, we run our (very) naive benchmark logic in a loop. Next, we extend it to compare the average case and the worst case of Insertion sort. Finally, we compare insertion sort with merge sort.

Building our data set

Let us construct a list of 10,000 numbers as our data set. I have made the upper bound (max number) of the number as 1,000,000 (1 million).

private static int[] buildArray(int size) {
Random random = new Random();
return IntStream.range(0, size)
        .map(i -> random.nextInt(1000000))
        .toArray();
}

Measuring performance in a loop

Let us run the benchmarking code of measuring the execution time in a loop.

private static void measurePerformance(int[] input) {
    for (int i = 0; i < 100; i++) {
        int[] newArray = Arrays.copyOf(input, input.length);
        long start = System.nanoTime();
        Sorts.insertionSort(newArray);
        long end = System.nanoTime();
        System.out.println("Sorting took " + (end - start) / 1000_000f + " ms ");
    }
    
}

Call it as

measurePerformance(buildArray(10000));

The reason we are copying the array before sorting is that our insertion sort sorts the array in place i.e., after the sort method returns the passed array will be in ascending order (sorted). This will affect the measurement on subsequent runs. Hence, we make a copy of the input array before each call.

This prints like (the exact result will definitely vary from run to run and from machine to machine)

Sorting took 28.479357 ms 
Sorting took 17.107634 ms 
Sorting took 16.817558 ms 
...
Sorting took 22.585983 ms 
Sorting took 38.791553 ms 
Sorting took 12.791842 ms 
Sorting took 19.27184 ms 
Sorting took 41.42186 ms 
Sorting took 33.064342 ms 
Sorting took 12.735878 ms 
Sorting took 21.854004 ms 
...
Sorting took 11.685804 ms 
Sorting took 11.447527 ms 
Sorting took 11.473101 ms

There is a lot of deviations in the timing numbers reported. The time varies from 11 ms to a max of 41 ms. It seems to get settled down towards the end where the time is around 11 ms.

So, what we do to arrive at a single number as the running or execution time? We run it in a loop as above (and maybe more number of times say 1000 or 10,000) and take the average. This does not give us the right number as we are taking into account the occasional noisy readings as well.

Let us see a couple of extensions of this where the noise is evident.

Performance of insertion sort when the input is reverse sorted

In this, we invoke selection sort with the sorted array reversed. In other words, we are simulating its worst case runtime.

private static void measurePerformanceForAvgAndWorstCase(int[] input) {
    for (int i = 0; i < 100; i++) {
        int[] newArray = Arrays.copyOf(input, input.length);
        long start = System.nanoTime();
        Sorts.insertionSort(newArray); //avg case
        long end = System.nanoTime();
        long result1 = (end - start) / 1000_000L;

        //Reverse the sorted array
        int[] reversedArray = new int[newArray.length];
        for (int j = newArray.length - 1, k = 0; j >= 0; j--) {
            reversedArray[k++] = newArray[j];
        }
        start = System.nanoTime();
        Sorts.insertionSort(reversedArray); //worst case
        end = System.nanoTime();
        long result2 = (end - start) / 1000_000L;
        System.out.println(result1 + "," + result2);
    }
}

This prints like (all units are in milliseconds)

29,35
17,28
11,23
14,42
..
32,23
47,30
..
11,23
12,26

We print the average case execution time followed by the worst case time. As seen earlier, there are some discrepancies and noise in the numbers. The difference is not uniform (or close to uniform).

Also, there are some cases where the worst case is less than the average case (32,23 and 47,30). And, this happens for the same data set (we run our measurement in a loop for the same array). But for most entries the worst case time > average case time. This is a second form of noise in our measurement.

Comparing Insertion sort with Quick sort

Let us compare the runtime of Insertion sort (average runtime of n^2) with Quick sort (average runtime of n log n).

 private static void compareInsertionAndQuickSort(int[] input) {
    for (int i = 0; i < 1000; i++) {
        int[] newArray = Arrays.copyOf(input, input.length);
        long start = System.nanoTime();
        Sorts.insertionSort(newArray);
        long end = System.nanoTime();
        long result1 = (end - start) / 1000_000L;

        newArray = Arrays.copyOf(input, input.length);
        start = System.nanoTime();
        Sorts.quickSort(newArray);
        end = System.nanoTime();
        long result2 = (end - start) / 1000_000L;
        System.out.println(result1 + "," + result2);
    }
 }

It prints the runtime of insertion sort followed by that of quick sort (again all units are in milliseconds)

Sample output:

27,6
21,0
31,0
14,0
16,2
16,1
..
12,14
..
35,0
11,0

First we can see deviations or oscillations in the time of insertion sort. Most cases, quick sort takes ~0 ms. But on some cases it takes as long as 14 ms. Second, on some cases the quick sort’s time is greater than that of insertion sort.

Problems with our very naive approach

Running a benchmark using a clock or time based does not yield the right results. I also mean the other approaches commonly used like Google Guava’s stopwatch and Instant class in Java. As said earlier we normally run the tests in a loop and take the average. The noise of variations in the numbers seen are due to a lot of things happening behind the scenes in the JVM. The first time a code runs takes more time that the following runs. JVM identifies a hot method and does some optimisations. Garbage collection (GC) also has a part in this.

Other than this, the things we do like copying the input array, printing the results to the screen can also play a part.

Summary

In this post, we saw the most common flawed approach of measuring the execution time of a code in Java. We started with measuring/benchmarking the insertion sort code. We followed it with by comparing the average and the worst case of it. Finally, we compared insertion sort with quick sort. In all cases, we noticed deviations in the output and learnt that such a method of measuring execution time of a code or benchmarking is wrong and should not be relied upon.

In the next post, we will improve this very naive approach to arrive at a reasonable sounding benchmarking method. Stay tuned.

Other references

https://stackoverflow.com/questions/2553347/how-can-i-benchmark-method-execution-time-in-java

Introduction#

Measuring execution time#

The sorting algorithms#

Very naive way to benchmark code#

Building our data set#

Measuring performance in a loop#

Performance of insertion sort when the input is reverse sorted#

Comparing Insertion sort with Quick sort#

Problems with our very naive approach#

Summary#

Other references#