I'm a novice on hadoop, I'm getting familiar to the style of map-reduce programing but now I faced a problem : Sometimes I need only map for a job and I only need the map result directly as output, which means reduce phase is not needed here, how can I achive that?
The reducer uses the right data types specific to Hadoop MapReduce (line 50-52). The reduce (Object, Iterable, Context) method is called for each <key, (collection of values)> in the sorted inputs. The output of the reduce task is written to a RecordWriter via TaskInputOutputContext. write(Object, Object) (line 54-56).
Conclusion. In conclusion, Map only job in Hadoop reduces the network congestion by avoiding shuffle, sort and reduce phase. Mapper takes care of overall processing and produces the output. We can achieve this by using the job.
Map-Only job in the Hadoop is the process in which mapper does all tasks. No task is done by the reducer. Mapper's output is the final output. MapReduce is the data processing layer of Hadoop. It processes large structured and unstructured data stored in HDFS.
This turns off the reducer.
job.setNumReduceTasks(0);
http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapreduce/Job.html#setNumReduceTasks(int)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With