Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to get the input file name in the mapper in a Hadoop program?

How I can get the name of the input file within a mapper? I have multiple input files stored in the input directory, each mapper may read a different file, and I need to know which file the mapper has read.

like image 680
HHH Avatar asked Sep 25 '13 18:09

HHH


People also ask

What is input to mapper in Hadoop?

Hadoop Mapper is a function or task which is used to process all input records from a file and generate the output which works as input for Reducer. It produces the output by returning new key-value pairs.

What is the input to the mapper?

Mapper is a function which process the input data. The mapper processes the data and creates several small chunks of data. The input to the mapper function is in the form of (key, value) pairs, even though the input to a MapReduce program is a file or directory (which is stored in the HDFS).

What is the output of the mapper?

The output of the mapper is the full collection of key-value pairs. Before writing the output for each mapper task, partitioning of output take place on the basis of the key. Thus partitioning itemizes that all the values for each key are grouped together. Hadoop MapReduce generates one map task for each InputSplit.

What is mapper and reducer in Hadoop?

The mapper processes the data and creates several small chunks of data. Reduce stage − This stage is the combination of the Shuffle stage and the Reduce stage. The Reducer's job is to process the data that comes from the mapper. After processing, it produces a new set of output, which will be stored in the HDFS.


1 Answers

First you need to get the input split, using the newer mapreduce API it would be done as follows:

context.getInputSplit(); 

But in order to get the file path and the file name you will need to first typecast the result into FileSplit.

So, in order to get the input file path you may do the following:

Path filePath = ((FileSplit) context.getInputSplit()).getPath(); String filePathString = ((FileSplit) context.getInputSplit()).getPath().toString(); 

Similarly, to get the file name, you may just call upon getName(), like this:

String fileName = ((FileSplit) context.getInputSplit()).getPath().getName(); 
like image 75
Amar Avatar answered Sep 27 '22 17:09

Amar