Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Passing arguments to Hadoop mappers

I'm using new Hadoop API and looking for a way to pass some parameters (few strings) to mappers.
How can I do that?

This solutions works for old API:

JobConf job = (JobConf)getConf();
job.set("NumberOfDocuments", args[0]);

Here, “NumberOfDocuments” is the name of parameter and its value is read from “args[0]“, a command line argument. Once you set this arguments, you can retrieve its value in reducer or mapper as follows:

private static Long N;
public void configure(JobConf job) {
     N = Long.parseLong(job.get("NumberOfDocuments"));
}

Note, the tricky part is that you cannot set parameters like this:

Configuration con = new Configuration();
con.set("NumberOfDocuments", args[0]);
like image 1000
wlk Avatar asked Nov 23 '11 15:11

wlk


People also ask

How do you pass an external variable to a mapper class?

set("messageToBePassed-OR-anyValue", "123-awesome-value :P"); Setting the message/variable using the new mapreduce API: Configuration conf = new Configuration(); conf. set("messageToBePassed-OR-anyValue", "123-awesome-value :P"); Job job = new Job(conf);

What are the parameters of mappers and reducers?

The four basic parameters of a mapper are LongWritable, text, text and IntWritable. The first two represent input parameters and the second two represent intermediate output parameters. What are the four basic parameters of a reducer? The four basic parameters of a reducer are Text, IntWritable, Text, IntWritable.

Can we control number of mappers?

No, The number of map tasks for a given job is driven by the number of input splits. For each input split a map task is spawned. So, we cannot directly change the number of mappers using a config other than changing the number of input splits.

Can we set number of mappers in Hadoop?

You cannot set number of mappers explicitly to a certain number which is less than the number of mappers calculated by Hadoop. This is decided by the number of Input Splits created by hadoop for your given set of input.


1 Answers

In the main method set the required parameter as below or using the -D command line option while running the job.

Configuration conf = new Configuration();
conf.set("test", "123");

Job job = new Job(conf);

In the mapper/reducer get the parameter as

Configuration conf = context.getConfiguration();
String param = conf.get("test");
like image 99
Praveen Sripati Avatar answered Sep 26 '22 09:09

Praveen Sripati