This may be very basic question but still may be helpful to many newbies like me.
Can there be a MR job without mapper? Any scenario where we need to implement this and the way to do this?
The whole purpose of the combiner is to 'combine' parts of mapper input so that the map outputs consume lesser network bandwidth when sent over to the reducer. Without the reducer, the whole point of using a combiner becomes moot. Hence, no, it is not possible.
Number of Mappers in a MapReduce job depends upon the total number of InputSplits. If you have 1GB of file that makes 8 blocks (of 128MB) so there will be only 8 mappers running on cluster. Number of Mappers = Number of input splits.
We can achieve this by setting job. setNumreduceTasks(0) in the configuration in a driver. This will make a number of reducer as 0 and thus the only mapper will be doing the complete task.
No, The number of map tasks for a given job is driven by the number of input splits. For each input split a map task is spawned. So, we cannot directly change the number of mappers using a config other than changing the number of input splits.
IdentityMapper is a mapper which maps input directly to output.
Suppose you have an input which is of the format key-value and you don't have anything to do with this at the mapper phase, and the only thing you would like to do is to group the values based on the key and do some aggregation operation at the reducer phase on the values, you can use this mapper.
If MapReduce programmer do not set the Mapper Class using JobConf.setMapperClass then IdentityMapper.class is used as a default value.
if you are not mentioning the mapper even then there will be one mapper running.so in any case atleast one mapper will be running.
http://www.fromdev.com/2010/12/interview-questions-hadoop-mapreduce.html
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With