Collectors toMap duplicate key

Introduction

I have earlier written a post on Java 8 Collectors toMap. It explains the usage of the toMap static methods in the Collectors class. I touched upon the scenario where it can throw an IllegalStateException when it encounters a duplicate key. In this post, I will dive deep into the Collectors.toMap duplicate key exception with an example. I will provide options on how you can resolve this error.

Also, the error messaging has been changed in Java 9 to provide a better message for the IllegalStateException (duplicate key exception)I found the changes they’ve made for this to be interesting and hence I’ve included it in the last part of this post i.e., I will also show how the Collectors.toMap library code was in Java 8 and how it was changed to support a better (and a clear) exception message for the duplicate keys case.

This post is structured as follows:

  • A very brief introduction to Collectors.toMap method
  • When Collectors.toMap would throw an IllegalStateException and the messaging difference between Java 8 and Java 9+.
  • Ways to fix the duplicate key exception
  • Changes in Java 9 to change the exception message 

If you are here to just learn how to fix the error, feel free to ignore the last part of this post.

A recap on Collectors toMap

Before moving on, here is a quick recap about the toMap methods.

There are three overloads of the toMap methods.

 

Overload 1 – M1
public static <T, K, U>
    Collector<T, ?, Map<K,U>> toMap(Function<? super T, ? extends K> keyMapper,
                                    Function<? super T, ? extends U> valueMapper)
It takes the key mapper and the value mapper. Uses a throwing merger (throws an exception) as the default merge function when it encounters a duplicate key. It returns the result in a HashMap (the type of map it returns is an implementation detail – not to be relied upon).

Overload 2 – M2
public static <T, K, U>
    Collector<T, ?, Map<K,U>> toMap(Function<? super T, ? extends K> keyMapper,
                                    Function<? super T, ? extends U> valueMapper,
                                    BinaryOperator<U> mergeFunction)
It takes the key and the value mapper as earlier. It explicitly takes a merge function used to resolve the value on duplicate keys. Similar to the method M1, it uses a HashMap as the result map type.

Overload 3 – M3
 public static <T, K, U, M extends Map<K, U>>
    Collector<T, ?, M> toMap(Function<? super T, ? extends K> keyMapper,
                                Function<? super T, ? extends U> valueMapper,
                                BinaryOperator<U> mergeFunction,
                                Supplier<M> mapSupplier)

 

Collectors toMap – Duplicate keys exception

Let me use the same example used in the earlier post on the Collectors.toMap. 

List<Student> students = new ArrayList<>();
students.add(new Student(1L, "John", 25));
students.add(new Student(2L, "Evans", 33));
students.add(new Student(3L, "Chris", 19));
students.add(new Student(4L, "Jennifer", 25));
students.add(new Student(5L, "Mitch", 29));
students.add(new Student(6L, "Evans", 25));

students.stream()
        .collect(Collectors.toMap(Student::getName, Function.identity()));

The above code collects the stream of Students into a map keyed by the student name. Since there are two students with name=Evans, and since we haven’t passed a merge function, it will throw an exception.
In Java 8, it will throw an IllegalStateException with the message as shown below.

java.lang.IllegalStateException: Duplicate key Id: [2, Evans]

The exception message is not clear and doesn’t give us much to act on. It specifies one of the values (toString representation of the value). In other words, it encountered a duplicate key when merging the student object ‘Evans‘ with id 2 and id 6. It threw the duplicate key exception with one of the values – the one that was already present on the map.

In Java 9, running the same above code gives us,

java.lang.IllegalStateException: Duplicate key Evans (attempted merging values [2, Evans] and [6, Evans])

This is the best exception message we can get. It not only specifies the key that clashes (Evans) but also the values (toString representation of the value).

Note: It will throw the duplicate key exception, even if the previous and the current values are the same.

students.stream()
        .collect(Collectors.toMap(Student::getName, s -> 1)); 

For the same key (‘Evans’), the earlier mapped value and the current mapped value (by the value mapper) are the same in this case (1). Nevertheless, we would get the below error (Java 9+ error message format).

java.lang.IllegalStateException:  Duplicate key Evans (attempted merging values 1 and 1)

Fixing Collectors#toMap duplicate key exception

We’ve seen why it throws a duplicate key exception. In short, when we collect a stream of values into a map, we have more than one stream element being mapped to the same map key (no matter what the mapped value is). So, it throws this exception, since it doesn’t know whether the subsequent mapped value (for the same key) should override the previous one or not. 

Let us now look at the options to fix it.

Option 1 – Avoid generating duplicate key in the key mapper

This may not be always possible, but worth exploring. In our code, the problem was that we used the student’s name as the key of the map and we had more than one student with the same name. If we could choose a different key for the map (say id), we can avoid the error.

Map<Long, String> idToNames = students.stream()
        .collect(Collectors.toMap(Student::getId, Student::getName));
System.out.println(idToNames);

Prints,

{1=John, 2=Evans, 3=Chris, 4=Jennifer, 5=Mitch, 6=Evans}

Option 2 – Pass a merge function

The best option to resolve the issue is to pass a merge function. I’ve shown this in the Collectors toMap post. Basically, a merge function is a BinaryOperator<T> i.e., a BiFunction<T,T,T> and we can use it to resolve collisions between values associated with the same key.

In this case, when the stream pipeline encounters a duplicate key, it will call the merge function with both the values i.e., the value already mapped to the current (duplicate) key and the new value (result of the value mapper function for the current stream element). Then, we can do anything we want to resolve the collision. This will depend on your use-case or need. 

One option is to pick any one of them.

Map<String, Student> nameToStudentInstance = students.stream()
        .collect(Collectors.toMap(Student::getName, Function.identity(),
                (oldValue, newValue) -> oldValue));
System.out.println(nameToStudentInstance);
For student Evans, when it encounters the duplicate key, it will call the merge function with the previously mapped value and the new value from the value mapper function.

Running this, we get,  

{Mitch=[5, Mitch], Jennifer=[4, Jennifer], Chris=[3, Chris], Evans=[2, Evans], John=[1, John]}

Note that the final mapped Student instance for the key ‘Evans’ is the student instance with id = 2. This is because we’ve picked the old value in the merge function and hence it ignores the second student instance ‘[6, Evans]’.

Other options with the merge function

There are other things you can do with a merge function. You can create and return new value or merge them.

Let us take a different example here.

We have a class representing an item, as shown below.

public class Item {
    private String name;
    private int quantity;

    private Item(String name, int quantity) {
        this.name = name;
        this.quantity = quantity;
    }

    public String getName() {
        return name;
    }

    public int getQuantity() {
        return quantity;
    }

    //equals, hashCode skipped
    
    @Override
    public String toString() {
        return "[" + name + ":" + quantity + "]";
    }
}

We have a list of items and we want to convert the list of items as a map of item name to quantity.

List<Item> items = Arrays.asList(
        new Item("I1", 2),
        new Item("I2", 3),
        new Item("I3", 5),
        new Item("I1", 3),
        new Item("I4", 6)
);
Map<String, Integer> itemNameToQuantity = items.stream()
        .collect(Collectors.toMap(Item::getName, Item::getQuantity));

Since we have item ‘I1’ twice, it will throw a duplicate key exception. To fix it, we can use the merge function, and in it we can return the sum of the quantities of the same item.

Map<String, Integer> itemNameToQuantity = items.stream()
        .collect(Collectors.toMap(Item::getName, Item::getQuantity,
                Integer::sum));
System.out.println(itemNameToQuantity); // {I1=5, I2=3, I3=5, I4=6}

Here, Integer::sum is a method reference of the lambda expression (oldQuantity, newQuantity) -> oldQuantity + newQuantity).

If we are collecting as Map<String, Item>, then in the merge function, we can create a new Item, as shown below. 

We create an Item instance with the name and its quantity set to the sum of the quantities of the old (existing) item and the new item.
Map<String, Item> itemNameToItemInstance = items.stream()
        .collect(Collectors.toMap(Item::getName, Function.identity(),
                (oldItem, newItem) -> new Item(oldItem.getName(),
                        oldItem.getQuantity() + newItem.getQuantity())));
System.out.println(itemNameToItemInstance);

The resultant map value is,

{I1=[I1:5], I2=[I2:3], I3=[I3:5], I4=[I4:6]}

Option 3 – Collect the values as a list

The last option is to make the value as a list and collect all the values into a list for duplicate keys. With the student example, the result map will be of type Map<String, List<Student>> i.e., for each student name (key), there will be a list of student instances (values).

For this, we have to create a mutable list in the value mapper function with the current student instance in it. Then, we pass a merge function to merge the two lists.

Map<String, List<Student>> nameToStudentsInstance = students.stream()
        .collect(Collectors.toMap(Student::getName, student -> new ArrayList<>(Collections.singletonList(student)),
                (oldList, newList) -> {
                    oldList.addAll(newList);
                    return oldList;
                }));
System.out.println(nameToStudentsInstance);

Prints,

{Mitch=[[5, Mitch]], Jennifer=[[4, Jennifer]], Chris=[[3, Chris]], Evans=[[2, Evans], [6, Evans]], John=[[1, John]]}

Even though the above will work, it is not recommended to do this way.

What we are basically doing it grouping the students by their name and we have a special collector to do just that – Collectors.groupingBy. Using that, it would look like as shown below and resulting in the same result as before.

Map<String, List<Student>> nameToStudentsInstance = students.stream()
        .collect(Collectors.groupingBy(Student::getName));

JDK changes in Java 9 to change the duplicate key error message

The rest of the post is about understanding the changes in JDK 9 to provide a better message in the IllegalStateException.

Let us first look at the implementation of the toMap method in Java 8 from which we will understand the reason why passing the key in the exception message was not possible. Then, we will look at the changes made in Java 9 to enable this.

Java 8 – Collectors toMap code

The toMap methods M1 and M2 delegate to the third overload (M3) of the toMap method. The first overload uses a throwingMerger as the merge function (will see this soon). The first and the second overload use a HashMap as default map type on which the results will be collected into.

public static <T, K, U>
  Collector<T, ?, Map<K,U>> toMap(Function<? super T, ? extends K> keyMapper,
                                Function<? super T, ? extends U> valueMapper) {
    return toMap(keyMapper, valueMapper, throwingMerger(), HashMap::new);
  }
    
public static <T, K, U>
  Collector<T, ?, Map<K,U>> toMap(Function<? super T, ? extends K> keyMapper,
                                Function<? super T, ? extends U> valueMapper,
                                BinaryOperator<U> mergeFunction) {
    return toMap(keyMapper, valueMapper, mergeFunction, HashMap::new);
}

The third overload (M3) of the toMap code is shown below.

public static <T, K, U, M extends Map<K, U>>
  Collector<T, ?, M> toMap(Function<? super T, ? extends K> keyMapper,
                            Function<? super T, ? extends U> valueMapper,
                            BinaryOperator<U> mergeFunction,
                            Supplier<M> mapSupplier) {
    BiConsumer<M, T> accumulator
            = (map, element) -> map.merge(keyMapper.apply(element),
                                          valueMapper.apply(element), mergeFunction);
    return new CollectorImpl<>(mapSupplier, accumulator, mapMerger(mergeFunction), CH_ID);
  }

It returns an implementation of the Collector (CollectorImpl) which is a private static class and hence cannot be accessed from outside the class.

It creates an accumulator and a combiner. The accumulator is used to combine or fold the individual values of a stream into a mutable result container. Here, it will take each element of the stream and insert it into the map as we desire (using the passed key and value mapper). The combiner takes two partial result containers and merges them. In other words, in this case, it will take two intermediate partial result maps, merges them, and returns the merged map. The combiner is employed only when using a parallel stream.

 
When making a stream parallel, the elements of the stream are split into multiple groups. The accumulating operation happens in each group, thus producing multiple intermediate result containers. The combiner then takes each of the intermediate results and merges them to produce the final result.

 

Returning back to the toMap’s accumulator, it takes a map and an element from the stream and folds the element into the map. In order to do this, it has to compute the key and the value using the passed keyMapper and the valueMapper. Then it merges using the merge method as part of the Map class. 
 
It constructs the combiner using the mapMerger private method. It takes two maps and returns the merged map (a BinaryOperator) and loops through each entry of the second map and calls merge on the first map. The merge method takes a mapper (mergeFunction) and uses it when there is a duplicate key. This is how the function looks:
 
private static <K, V, M extends Map<K,V>>
    BinaryOperator<M> mapMerger(BinaryOperator<V> mergeFunction) {
        return (m1, m2) -> {
            for (Map.Entry<K,V> e : m2.entrySet())
                m1.merge(e.getKey(), e.getValue(), mergeFunction);
            return m1;
        };
    }

Map merge method

The merge method was added to the Map class in Java 8. The merge method takes a key, a value, and a merge function. 

Case 1: The key is absent
The value is inserted against the key. It simply does map.put(<key>, <value>).
 
Case 2: The key already exists
A duplicate key case. It will then use the merge function to determine the new value. The old (existing) value and the new value are passed to the merge function. The output of the merge function is then inserted against the key.
 
From the javadoc of Map.merge, it simply does the following,
V oldValue = map.get(key);
V newValue = (oldValue == null) ? value :
          remappingFunction.apply(oldValue, value);
if (newValue == null)
    map.remove(key);
else
    map.put(key, newValue);

When using the first overload toMap method (M1), it calls the throwingMerger method to get the default merge function. As the name suggests, it will throw an exception on duplicate keys.

private static <T> BinaryOperator<T> throwingMerger() {
    return (u,v) -> { throw new IllegalStateException(String.format("Duplicate key %s", u)); };
}
The merge function returned above takes two values and throws an IllegalStateException where the message has the first value. The above returned merge function is used by the accumulator and is also passed to the mapMerger function (used by the combiner) (refer to the code of the toMap – M3 overload method).
 

Since the merge function of the Map takes the old and the new value, the above function gets the old and the new value (for u and v parameters). This is the reason why the exception message of the Collectors.toMap duplicate key had the value rather than the key.

Collectors toMap – Library code changes in Java 9

In Java 9, the first version of the toMap overload (M1) was completely rewritten with two helper functions to enable throwing an exception with a helpful exception message. Let us look at the implementation of toMap (first overload) in Java 9.

public static <T, K, U>
  Collector<T, ?, Map<K,U>> toMap(Function<? super T, ? extends K> keyMapper,
                                   Function<? super T, ? extends U> valueMapper) {
       return new CollectorImpl<>(HashMap::new,
                                  uniqKeysMapAccumulator(keyMapper, valueMapper),
                                  uniqKeysMapMerger(),
                                  CH_ID);
}

It uses two new private methods to construct the accumulator and the combiner. The uniqKeysMapAccumulator method builds the accumulator and uniqKeysMapMerger builds the combiner.

uniqKeysMapAccumulator method

BiConsumer<Map<K, V>, T> 
    uniqKeysMapAccumulator(Function<? super T, ? extends K> keyMapper,
                           Function<? super T, ? extends V> valueMapper) {
    return (map, element) -> {
        K k = keyMapper.apply(element);
        V v = Objects.requireNonNull(valueMapper.apply(element));
        V u = map.putIfAbsent(k, v);
        if (u != null) throw duplicateKeyException(k, u, v);
    };
}

This method’s responsibility is to construct the accumulator i.e., similar to the accumulator seen earlier in the code of the M3 method. The difference is that it does not call Map#merge. The only reason they weren’t able to add the conflicting key to the exception message was because of using Map#merge method. Now, it uses Map’s putIfAbsent and not merge. The putIfAbsent method returns the previous value associated with the key and null if there was none.

So, it checks if the return value of putIfAbsent is non-null. If yes, then it is a duplicate key case. Then it throws an exception using the duplicateKeyException method.

 

uniqKeysMapMerger method

private static <K, V, M extends Map<K,V>>
    BinaryOperator<M> uniqKeysMapMerger() {
       return (m1, m2) -> {
           for (Map.Entry<K,V> e : m2.entrySet()) {
               K k = e.getKey();
               V v = Objects.requireNonNull(e.getValue());
               V u = m1.putIfAbsent(k, v);
               if (u != null) throw duplicateKeyException(k, u, v);
        }
        return m1;
    };
}

It is the combiner function. The core logic is similar to the above except that its job is to merge two maps. Remember that the accumulator folds an element into a result, whereas the combiner merges two partial result containers.

The new Collectors toMap duplicate key exception method

Finally, this is the last piece of the jigsaw. As seen from above, when uniqKeysMapAccumulator or uniqKeysMapMerger calls this method to throw the IllegalStateException, the key is known to us. This now enables the toMap to report the conflicting or duplicate key in the thrown exception. It also includes both the values (the old and the new value). Note that the previous exception (in Java 8) included only one of the values, i.e., parameter u in the below code (the previously mapped value).

private static IllegalStateException duplicateKeyException(
            Object k, Object u, Object v) {
    return new IllegalStateException(String.format(
        "Duplicate key %s (attempted merging values %s and %s)", k, u, v));
}

Conclusion

In this post, we revisited the example where Collectors toMap can throw an IllegalStateException due to duplicate keys. We saw three options to handle the duplicate key exception.

We then saw the implementation code of the toMap method in Java 8. From that we learnt that using Map’s merge method was preventing the library developers from propagating the value of the key in the exception message. Last, we saw the changes made in Java 9 to enable passing back the value of the duplicate key in the exception message.

References

https://stackoverflow.com/questions/40039649/why-does-collectors-tomap-report-value-instead-of-key-on-duplicate-key-error

https://bugs.openjdk.java.net/browse/JDK-8040892

Leave a Reply