Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to write 'map only' hadoop jobs?

I'm a novice on hadoop, I'm getting familiar to the style of map-reduce programing but now I faced a problem : Sometimes I need only map for a job and I only need the map result directly as output, which means reduce phase is not needed here, how can I achive that?

like image 306
Breakinen Avatar asked Feb 22 '12 12:02

Breakinen


People also ask

How do I write a MapReduce program in Hadoop?

The reducer uses the right data types specific to Hadoop MapReduce (line 50-52). The reduce (Object, Iterable, Context) method is called for each <key, (collection of values)> in the sorted inputs. The output of the reduce task is written to a RecordWriter via TaskInputOutputContext. write(Object, Object) (line 54-56).

Is it possible to write map only Hadoop jobs?

Conclusion. In conclusion, Map only job in Hadoop reduces the network congestion by avoiding shuffle, sort and reduce phase. Mapper takes care of overall processing and produces the output. We can achieve this by using the job.

What is map only?

Map-Only job in the Hadoop is the process in which mapper does all tasks. No task is done by the reducer. Mapper's output is the final output. MapReduce is the data processing layer of Hadoop. It processes large structured and unstructured data stored in HDFS.


1 Answers

This turns off the reducer.

job.setNumReduceTasks(0); 

http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapreduce/Job.html#setNumReduceTasks(int)

like image 69
Thomas Jungblut Avatar answered Sep 22 '22 08:09

Thomas Jungblut