Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is shared mutability bad?

I was watching a presentation on Java, and at one point, the lecturer said:

"Mutability is OK, sharing is nice, shared mutability is devil's work."

What he was referring to is the following piece of code, which he considered an "extremely bad habit":

//double the even values and put that into a list. List<Integer> numbers = Arrays.asList(1, 2, 3, 4, 5, 1, 2, 3, 4, 5); List<Integer> doubleOfEven = new ArrayList<>();  numbers.stream()        .filter(e -> e % 2 == 0)        .map(e -> e * 2)        .forEach(e -> doubleOfEven.add(e)); 

He then proceeded writing the code that should be used, which is:

List<Integer> doubleOfEven2 =       numbers.stream()              .filter(e -> e % 2 == 0)              .map(e -> e * 2)              .collect(toList()); 

I don't understand why the first piece of code is "bad habit". To me, they both achieve the same goal.

like image 483
George Cernat Avatar asked May 27 '17 17:05

George Cernat


People also ask

Why is mutability bad?

Mutable objects reduce changeability. Mutable objects make the contracts between clients and implementers more complicated, and reduce the freedom of the client and implementer to change. In other words, using objects that are allowed to change makes the code harder to change.

Why is shared mutable state the root of all evil?

The root of all evil Shared mutable state makes code unpredictable, hard to reason about and hard to test. It's bad for the same reason global variables are bad. Data should be kept as local as possible and side effects should be avoided.

What is shared mutability?

Shared mutable state works as follows: If two or more parties can change the same data (variables, objects, etc.). And if their lifetimes overlap. Then there is a risk of one party's modifications preventing other parties from working correctly.

Why is mutability bad in Java?

Mutable data is inherently complex, because it can change. Complexity is what makes software difficult, because it makes doing anything with it hard.


2 Answers

Explanation to the first example snippet

The problem comes into play when performing parallel processing.

//double the even values and put that into a list. List<Integer> numbers = Arrays.asList(1, 2, 3, 4, 5, 1, 2, 3, 4, 5); List<Integer> doubleOfEven = new ArrayList<>();  numbers.stream()        .filter(e -> e % 2 == 0)        .map(e -> e * 2)        .forEach(e -> doubleOfEven.add(e)); // <--- Unnecessary use of side-effects! 

This unnecessarily uses side-effects while not all side effects are bad if used correctly when it comes to using streams one must provide behaviour that is safe to execute concurrently on different pieces of the input. i.e. writing code which doesn’t access shared mutable data to do its work.

The line:

.forEach(e -> doubleOfEven.add(e)); // Unnecessary use of side-effects! 

unnecessarily uses side-effects and when executed in parallel, the non-thread-safety of ArrayList would cause incorrect results.

A while back I read a blog by Henrik Eichenhardt answering as to why a shared mutable state is the root of all evil.

This is a short reasoning as to why shared mutability is not good; extracted from the blog.

non-determinism = parallel processing + mutable state

This equation basically means that both parallel processing and mutable state combined result in non-deterministic program behaviour. If you just do parallel processing and have only immutable state everything is fine and it is easy to reason about programs. On the other hand if you want to do parallel processing with mutable data you need to synchronize the access to the mutable variables which essentially renders these sections of the program single threaded. This is not really new but I haven't seen this concept expressed so elegantly. A non-deterministic program is broken.

This blog goes on to derive the inner details as to why parallel programs without proper synchronization are broken, which you can find within the appended link.

Explanation to the second example snippet

List<Integer> doubleOfEven2 =       numbers.stream()              .filter(e -> e % 2 == 0)              .map(e -> e * 2)              .collect(toList()); // No side-effects!  

This uses a collect reduction operation on the elements of this stream using a Collector.

This is much safer, more efficient, and more amenable to parallelization.

like image 100
Ousmane D. Avatar answered Oct 15 '22 19:10

Ousmane D.


The thing is that the lecture is slightly wrong at the same time. The example that he provided uses forEach, which is documented as:

The behavior of this operation is explicitly nondeterministic. For parallel stream pipelines, this operation does not guarantee to respect the encounter order of the stream, as doing so would sacrifice the benefit of parallelism...

You could use:

 numbers.stream()             .filter(e -> e % 2 == 0)             .map(e -> e * 2)             .parallel()             .forEachOrdered(e -> doubleOfEven.add(e)); 

And you would always have the same guaranteed result.

On the other hand the example that uses Collectors.toList is better, because Collectors respect encounter order, so it works just fine.

Interesting point is that Collectors.toList uses ArrayList underneath that is not a thread safe collection. It's just that is uses many of them (for parallel processing) and merges at the end.

A last note that parallel and sequential do not influence the encounter order, it's the operation applied to the Stream that do. Excellent read here.

We also need to think that even using a thread safe collection is still not safe with Streams completely, especially when you are relying on side-effects.

 List<Integer> numbers = Arrays.asList(1, 3, 3, 5);     Set<Integer> seen = Collections.synchronizedSet(new HashSet<>());     List<Integer> collected = numbers.stream()             .parallel()             .map(e -> {                 if (seen.add(e)) {                     return 0;                 } else {                     return e;                 }             })             .collect(Collectors.toList());      System.out.println(collected); 

collected at this point could be [0,3,0,0] OR [0,0,3,0] or something else.

like image 24
Eugene Avatar answered Oct 15 '22 19:10

Eugene