Brian Goetz's wrote a nice article on fork-join at http://www.ibm.com/developerworks/java/library/j-jtp03048.html. In it, he lists a merge sort algorithm using the fork-join mechanism, in which he performs the sort on two sides of an array in parallel, then merges the result.
The algorithm sorts on two different sections of the same array simultaneously. Why isn't an AtomicIntegerArray or some other mechanism necessary to maintain visibility? What guarantee is there that one thread will see the writes done by the other, or is this a subtly bug? As a follow up, does Scala's ForkJoinScheduler also make this guarantee?
Thanks!
Fork/Join in Java is used to make use of the cores (brain of CPU that process the instructions) in an efficient manner. The fork/join splits a bigger task into smaller sub-tasks. These sub-tasks are then distributed among the cores. The results of these subtasks are then joined to generate the final result.
ForkJoinPoolIt is an implementation of the ExecutorService that manages worker threads and provides us with tools to get information about the thread pool state and performance. Worker threads can execute only one task at a time, but the ForkJoinPool doesn't create a separate thread for every single subtask.
The problem with threads not seeing the latest value of a variable because it has not yet been written back to main memory by another thread, is called a “visibility” problem. The updates of one thread are not visible to other threads.
The fork/join framework was designed to speed up the execution of tasks that can be divided into other smaller subtasks, executing them in parallel and then combining their results to get a single one.
The join (of ForkJoin) itself requires a synchronization point, thats the most important piece of information. A synchronization point will ensure that all writes that happen are visible after said point.
If you take a look at the code you can see where the synchronization point occurs. This is just one method call invokeAll
public static void invokeAll(ForkJoinTask<?> t1, ForkJoinTask<?> t2) {
t2.fork();
t1.invoke();
t2.join();
}
Here t2 forks into another process, t1 executes its task and that calling thread will wait on t2.join(). When passing t2. All writes to t1 and t2 will then be visible.
Edit: This edit is just to give a little more of an explanation of what I meant by synchronization point.
Lets say that you have two variables
int x;
volatile int y;
Any time you write to y all writes that happened before you read y will be available. For example
public void doWork(){
x = 10;
y = 5;
}
If another thread reads y = 5 that thread is guaranteed to read x = 10. This is because the write to y creates a synchronization point in which all writes before said point will be visible after the write.
With the Fork Join pool the join of a ForkJoinTask will create a synchronization point. Now if t2.fork() and t1.invoke() the joining of t2 will ensure that all writes that previously happened will be seen. Since all the previous writes are within the same structure it will be safe for visibility.
I would be happy to explain further if that isnt as clear.
Just a guess: merge includes joining on a Thread, and the join guarantees the visibility.
The second part is sure; I don't know how is merge implemented.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With