Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Working of RecordReader in Hadoop

Can anyone explain how the RecordReader actually works? How are the methods nextkeyvalue(), getCurrentkey() and getprogress() work after the program starts executing?

like image 914
Amnesiac Avatar asked Jun 08 '12 05:06

Amnesiac


People also ask

What is RecordReader in Hadoop?

RecordReader , typically, converts the byte-oriented view of the input, provided by the InputSplit , and presents a record-oriented view for the Mapper and Reducer tasks for processing. It thus assumes the responsibility of processing record boundaries and presenting the tasks with keys and values.

What is the role of RecordReader in Hadoop MapReduce?

In MapReduce, RecordReader load data from its source and it converts the data into key-value pairs suitable for reading by the mapper. RecordReader communicates with the inputsplit until it does not read the complete file. The MapReduce framework defines RecordReader instance by the InputFormat.

What is the role of the InputFormat in Hadoop?

Hadoop InputFormat describes the input-specification for execution of the Map-Reduce job. InputFormat describes how to split up and read input files. In MapReduce job execution, InputFormat is the first step. It is also responsible for creating the input splits and dividing them into records.

What is record writer in Hadoop?

RecordWriter is the class which handles the job of taking an individual key-value pair i.e output from reducer and writing it to the location prepared by the OutputFormat. RecordWriter implements: 'write' and 'close'. The 'write' function takes key-values from the MapReduce job and writes the bytes to HDFS.


1 Answers

(new API): The default Mapper class has a run method which looks like this:

public void run(Context context) throws IOException, InterruptedException {
    setup(context);
    while (context.nextKeyValue()) {
        map(context.getCurrentKey(), context.getCurrentValue(), context);
    }
    cleanup(context);
}

The Context.nextKeyValue(), Context.getCurrentKey() and Context.getCurrentValue() methods are wrappers for the RecordReader methods. See the source file src/mapred/org/apache/hadoop/mapreduce/MapContext.java.

So this loop executes and calls your Mapper implementation's map(K, V, Context) method.

Specifically, what else would you like to know?

like image 125
Chris White Avatar answered Sep 22 '22 08:09

Chris White