Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

MapReduce job without mapper

This may be very basic question but still may be helpful to many newbies like me.

Can there be a MR job without mapper? Any scenario where we need to implement this and the way to do this?

like image 338
DebD Avatar asked Nov 26 '13 13:11

DebD


People also ask

Is it possible to write a MapReduce program without a reducer?

The whole purpose of the combiner is to 'combine' parts of mapper input so that the map outputs consume lesser network bandwidth when sent over to the reducer. Without the reducer, the whole point of using a combiner becomes moot. Hence, no, it is not possible.

How many mappers run for a MapReduce job?

Number of Mappers in a MapReduce job depends upon the total number of InputSplits. If you have 1GB of file that makes 8 blocks (of 128MB) so there will be only 8 mappers running on cluster. Number of Mappers = Number of input splits.

How do we execute a map only job in a MR job?

We can achieve this by setting job. setNumreduceTasks(0) in the configuration in a driver. This will make a number of reducer as 0 and thus the only mapper will be doing the complete task.

Can you control the no of mappers reducers how?

No, The number of map tasks for a given job is driven by the number of input splits. For each input split a map task is spawned. So, we cannot directly change the number of mappers using a config other than changing the number of input splits.


2 Answers

IdentityMapper is a mapper which maps input directly to output.

Suppose you have an input which is of the format key-value and you don't have anything to do with this at the mapper phase, and the only thing you would like to do is to group the values based on the key and do some aggregation operation at the reducer phase on the values, you can use this mapper.

like image 127
vishnu viswanath Avatar answered Nov 15 '22 10:11

vishnu viswanath


If MapReduce programmer do not set the Mapper Class using JobConf.setMapperClass then IdentityMapper.class is used as a default value.

if you are not mentioning the mapper even then there will be one mapper running.so in any case atleast one mapper will be running.

http://www.fromdev.com/2010/12/interview-questions-hadoop-mapreduce.html

like image 37
user. Avatar answered Nov 15 '22 10:11

user.