Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why setMapOutputKeyClass method is necessary in mapreduce job

When I write the mapreduce program, I often write the code like

 job1.setMapOutputKeyClass(Text.class); 

But why should we specify the MapOutputKeyClass explicitly? We have already spicify it in the mapper class such as

public static class MyMapper extends
        Mapper<LongWritable, Text, Text, Text>

In the book Hadoop:The definitive Guide, there is a table shows that the method setMapOutputKeyClass is optional(Properties for configuring types), but as I test, I found it is necessary, or the Console of eclipse will show

Type mismatch in key from map: expected org.apache.hadoop.io.LongWritable, received org.apache.hadoop.io.Text

Can someone tell me the reason of it?

In the book, it says

"The settings that have to be compatible with the MapReduce types are listed in the lower part of Table 8-1". Does it mean we have to set the lower part property type, but do not have to set the higher part ones?

the content of the table looks like this:

Properties for configuring types:
mapreduce.job.inputformat.class  
mapreduce.map.output.key.class  
mapreduce.map.output.value.class  
mapreduce.job.output.key.class  
mapreduce.job.output.value.class 

Properties that must be consistent with the types:
mapreduce.job.map.class   
mapreduce.job.combine.class  
mapreduce.job.partitioner.class  
mapreduce.job.output.key.comparator.class 
mapreduce.job.output.group.comparator.class  
mapreduce.job.reduce.class  
mapreduce.job.outputformat.class
like image 355
Coinnigh Avatar asked Apr 26 '26 09:04

Coinnigh


1 Answers

setMapOutputKeyClass() as well as setMapOutputValueClass() are optional as long as they match your job's output types specified by setOutputKeyClass() and setOutputValueClass() respectively. In other words, if your mapper output does not match your reducer output you have to use one or both of these methods.

As for your question regarding generic arguments, due to Java type erasure (Java generics type erasure: when and what happens?), Hadoop does not know them at runtime, even though they are known to the compiler.

like image 69
yurgis Avatar answered Apr 29 '26 05:04

yurgis



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!