I ran Hadoop MapReduce on 1.1GB file multiple times with a different number of mappers and reducers (e.g. 1 mapper and 1 reducer, 1 mapper and 2 reducers, 1 mapper and 4 reducers, ...)
Hadoop is installed on quad-core machine with hyper-threading.
The following is the top 5 result sorted by shortest execution time:
+----------+----------+----------+
| time | # of map | # of red |
+----------+----------+----------+
| 7m 50s | 8 | 2 |
| 8m 13s | 8 | 4 |
| 8m 16s | 8 | 8 |
| 8m 28s | 4 | 8 |
| 8m 37s | 4 | 4 |
+----------+----------+----------+
The result for 1 - 8 reducers and 1 - 8 mappers: column = # of mappers row = # of reducers
+---------+---------+---------+---------+---------+
| | 1 | 2 | 4 | 8 |
+---------+---------+---------+---------+---------+
| 1 | 16:23 | 13:17 | 11:27 | 10:19 |
+---------+---------+---------+---------+---------+
| 2 | 13:56 | 10:24 | 08:41 | 07:52 |
+---------+---------+---------+---------+---------+
| 4 | 14:12 | 10:21 | 08:37 | 08:13 |
+---------+---------+---------+---------+---------+
| 8 | 14:09 | 09:46 | 08:28 | 08:16 |
+---------+---------+---------+---------+---------+
(1) It looks that the program runs slightly faster when I have 8 mappers, but why does it slow down as I increase the number of reducers? (e.g. 8mappers/2reducers is faster than 8mappers/8reducers)
(2) When I use only 4 mappers, it's a bit slower simply because I'm not utilizing the other 4 cores, right?
The number of mapper depends on the total size of the input. i.e. the total number of blocks of the input files.
Hadoop runs 2 mappers and 2 reducers (by default) in a data node, the number of mappers can be changed in the mapreduce.
Suppose your data size is small, then you don't need so many mappers running to process the input files in parallel. However, if the <key,value> pairs generated by the mappers are large & diverse, then it makes sense to have more reducers because you can process more number of <key,value> pairs in parallel.
Follow the link to learn more about Reducer in Hadoop The right number of Reducer seems to be 0.95 or 1.75 multiplied by (<no. of nodes> * <no. of maximum containers per node>). With 0.95 all of the reduces can launch immediately and start transferring map outputs as the maps finish.
The optimal number of mappers and reducers has to do with a lot of things.
The main thing to aim for is the balance between the used CPU power, amount of data that is transported (in mapper, between mapper and reducer, and out the reducers) and the disk 'head movements'.
Each task in a mapreduce job works best if it can read/write the data 'with minimal disk head movements'. Usually described as "sequential reads/writes". But if the task is CPU bound the extra diskhead movements do not impact the job.
It seems to me that in this specific case you have
Possible ways to handle this kind of situation:
First do exactly what you did: Do some test runs and see which setting performs best given this specific job and your specific cluster.
Then you have three options:
Suggestions for shifting the load:
If CPU bound and all CPUs are fully loaded then reduce the CPU load:
If IO bound and you have some CPU capacity left:
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With