Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Parallel streams

There is a function, which calculates the most frequent name (Human[] people) in parallel. But there is data race. Why?

    Map<String, Integer> nameMap = new ConcurrentHashMap<>();
        Arrays.stream(people)
                .parallel()
                .filter(p -> p.isAdult())
                .map(Human::getName)
                .forEach(p -> nameMap.put(p, nameMap.containsKey(p) ? nameMap.get(p) + 1 : 1));
        return nameMap.entrySet().parallelStream().max((entry1, entry2) -> entry1.getValue() > entry2.getValue() ? 1 : -1).get().getKey();
like image 230
KateS Avatar asked May 03 '26 14:05

KateS


1 Answers

because you are doing a get, then increment and then a put; in between someone might have already put that entry into nameMap.

You could have used ConcurrentHashMap#merge that is atomic here, or better use Collectors.toConcurrentMap

EDIT

You could have done it probably a bit more clear:

  Arrays.stream(people)
        .parallel()
        .filter(Human::isAdult)
        .collect(Collectors.groupingBy(Human::getName, Collectors.counting()))
        .entrySet()
        .stream()
        .max(Comparator.comparing(Entry::getValue))
        .map(Entry::getKey)
        .get();

Just notice that I am close to being sure you don't need parallel at all

like image 126
Eugene Avatar answered May 05 '26 06:05

Eugene



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!