What is the key difference between Fork/Join and Map/Reduce? Do they differ in the kind of decomposition and distribution (data vs. computation)?

There is a whole scientific paper on the subject, Comparing Fork/Join and MapReduce. The paper compares the performance, scalability and programmability of three parallel paradigms: fork/join, MapReduce, and a hybrid approach. What they find is basically that Java fork/join has low startup latency and scales well for small inputs (<5MB), but it cannot process larger inputs due to the size restrictions of shared-memory, single node architectures. On the other hand, MapReduce has significant startup latency (tens of seconds), but scales well for much larger inputs (>100MB) on a compute cluster. But there is a lot more to read there if you're up for it.

Difference between Fork/Join and Map/Reduce

2 Answers

One key difference is that F-J seems to be designed to work on a single Java VM, while M-R is explicitly designed to work on a large cluster of machines. These are very different scenarios.

F-J offers facilities to partition a task into several subtasks, in a recursive-looking fashion; more tiers, possibility of 'inter-fork' communication at this stage, much more traditional programming. Does not extend (at least in the paper) beyond a single machine. Great for taking advantage of your eight-core.

M-R only does one big split, with the mapped splits not talking between each other at all, and then reduces everything together. A single tier, no inter-split communication until reduce, and massively scalable. Great for taking advantage of your share of the cloud.

100

answered Sep 21 '22 19:09

tucuxi

There is a whole scientific paper on the subject, Comparing Fork/Join and MapReduce.

The paper compares the performance, scalability and programmability of three parallel paradigms: fork/join, MapReduce, and a hybrid approach.

What they find is basically that Java fork/join has low startup latency and scales well for small inputs (<5MB), but it cannot process larger inputs due to the size restrictions of shared-memory, single node architectures. On the other hand, MapReduce has significant startup latency (tens of seconds), but scales well for much larger inputs (>100MB) on a compute cluster.

But there is a lot more to read there if you're up for it.

answered Sep 18 '22 19:09

Per Quested Aronsson

Related questions
                            
                                Hadoop one Map and multiple Reduce
                            
                                What is Google's Dremel? How is it different from Mapreduce?
                            
                                Hadoop DistributedCache is deprecated - what is the preferred API?
                            
                                Check if every element in array matches condition
                            
                                List the namenode and datanodes of a cluster from any node?
                            
                                How does Hadoop perform input splits?
                            
                                MongoDB aggregation comparison: group(), $group and MapReduce
                            
                                Setting the number of map tasks and reduce tasks
                            
                                How to get the input file name in the mapper in a Hadoop program?
                            
                                MongoDB: Terrible MapReduce Performance
                            
                                How to write 'map only' hadoop jobs?
                            
                                What is a container in YARN?
                            
                                What are SUCCESS and part-r-00000 files in hadoop
                            
                                Explode the Array of Struct in Hive
                            
                                Is it better to use the mapred or the mapreduce package to create a Hadoop Job?
                            
                                hadoop.mapred vs hadoop.mapreduce?
                            
                                Is gzip format supported in Spark?
                            
                                Simple Java Map/Reduce framework [closed]
                            
                                Reduce a key-value pair into a key-list pair with Apache Spark
                            
                                data block size in HDFS, why 64MB?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Difference between Fork/Join and Map/Reduce

Tags:

mapreduce

fork-join

hotzen

People also ask

2 Answers

tucuxi

Per Quested Aronsson

Recent Activity

Donate For Us