Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Memory visibility in Fork-join

Brian Goetz's wrote a nice article on fork-join at http://www.ibm.com/developerworks/java/library/j-jtp03048.html. In it, he lists a merge sort algorithm using the fork-join mechanism, in which he performs the sort on two sides of an array in parallel, then merges the result.

The algorithm sorts on two different sections of the same array simultaneously. Why isn't an AtomicIntegerArray or some other mechanism necessary to maintain visibility? What guarantee is there that one thread will see the writes done by the other, or is this a subtly bug? As a follow up, does Scala's ForkJoinScheduler also make this guarantee?

Thanks!

like image 974
Joshua Hartman Avatar asked Jan 26 '11 00:01

Joshua Hartman


People also ask

What is significance of using fork join?

Fork/Join in Java is used to make use of the cores (brain of CPU that process the instructions) in an efficient manner. The fork/join splits a bigger task into smaller sub-tasks. These sub-tasks are then distributed among the cores. The results of these subtasks are then joined to generate the final result.

How does fork join pool work?

ForkJoinPoolIt is an implementation of the ExecutorService that manages worker threads and provides us with tools to get information about the thread pool state and performance. Worker threads can execute only one task at a time, but the ForkJoinPool doesn't create a separate thread for every single subtask.

What is visibility problem in Java?

The problem with threads not seeing the latest value of a variable because it has not yet been written back to main memory by another thread, is called a “visibility” problem. The updates of one thread are not visible to other threads.

When we should use fork and join?

The fork/join framework was designed to speed up the execution of tasks that can be divided into other smaller subtasks, executing them in parallel and then combining their results to get a single one.


2 Answers

The join (of ForkJoin) itself requires a synchronization point, thats the most important piece of information. A synchronization point will ensure that all writes that happen are visible after said point.

If you take a look at the code you can see where the synchronization point occurs. This is just one method call invokeAll

public static void invokeAll(ForkJoinTask<?> t1, ForkJoinTask<?> t2) {
    t2.fork();
    t1.invoke();
    t2.join();
}

Here t2 forks into another process, t1 executes its task and that calling thread will wait on t2.join(). When passing t2. All writes to t1 and t2 will then be visible.

Edit: This edit is just to give a little more of an explanation of what I meant by synchronization point.

Lets say that you have two variables

int x;
volatile int y;

Any time you write to y all writes that happened before you read y will be available. For example

public void doWork(){
   x = 10;
   y = 5;
}

If another thread reads y = 5 that thread is guaranteed to read x = 10. This is because the write to y creates a synchronization point in which all writes before said point will be visible after the write.

With the Fork Join pool the join of a ForkJoinTask will create a synchronization point. Now if t2.fork() and t1.invoke() the joining of t2 will ensure that all writes that previously happened will be seen. Since all the previous writes are within the same structure it will be safe for visibility.

I would be happy to explain further if that isnt as clear.

like image 107
John Vint Avatar answered Sep 20 '22 05:09

John Vint


Just a guess: merge includes joining on a Thread, and the join guarantees the visibility.

The second part is sure; I don't know how is merge implemented.

like image 41
maaartinus Avatar answered Sep 18 '22 05:09

maaartinus