I am working on a problem of Map-Reduce. But I stuck at one point that how can I pass List<Text>
as Mapper output
? Is it possible or not? If yes, then how can we tell the configuration
about the Mapper output class
?
The output of the mapper is the full collection of key-value pairs. Before writing the output for each mapper task, partitioning of output take place on the basis of the key. Thus partitioning itemizes that all the values for each key are grouped together. Hadoop MapReduce generates one map task for each InputSplit.
Can we configure mappers to write output on HDFS ? The output of Mapper is not written on HDFS because, the Block of data are replicated in the datanode based on the replication factor and namenode should hold the metadata of blocks.
Mapper task is the first phase of processing that processes each input record (from RecordReader) and generates an intermediate key-value pair. Hadoop Mapper store intermediate-output on the local disk.
Mapper is a function which process the input data. The mapper processes the data and creates several small chunks of data. The input to the mapper function is in the form of (key, value) pairs, even though the input to a MapReduce program is a file or directory (which is stored in the HDFS).
You may use the ArrayWritable class as value object from your mapper class. Please refer the below code snippet for your mapper class,
ArrayWritable arrayWritable = new ArrayWritable(Text.class);
Text [] textValues = new Text[2];
textValues[0] = new Text("value1");
textValues[1] = new Text("value1");
arrayWritable.set(textValues );
context.write(key , arrayWritable );
set the value class as following in your driver class,
job.setMapOutputValueClass(ArrayWritable.class);
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With