Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Hadoop command line -D options not working

I am trying to pass a variable (not property) using -D command line option in hadoop like -Dmapred.mapper.mystring=somexyz. I am able to set a conf property in Driver program and read it back in mapper. So I can use this to pass my string as additional parameter and set it in Driver. But I want to see if -D option can be used to do the same

My command is:

$HADOOP_HOME/bin/hadoop jar  /home/hduser/Hadoop_learning_path/toolgrep.jar /home/hduser/hadoopData/inputdir/ /home/hduser/hadoopData/grepoutput -Dmapred.mapper.mystring=somexyz

Driver program

String s_ptrn=conf.get("mapred.mapper.regex");

System.out.println("debug: in Tool Class mapred.mapper.regex "+s_ptrn + "\n"); Gives NULL

BUT this works

conf.set("DUMMYVAL","100000000000000000000000000000000000000"); in driver is read properly in mapper by get method. 

My question is if all of Internet is saying i can use -D option then why cant i? is it that this cannot be used for any argument and only for properties? whihc we can read by putitng in file that i should read in driver program then use it?

Something like

Configuration conf = new Configuration();
conf.addResource("~/conf.xml"); 

in driver program and this is the only way.

like image 515
vivek ashodha Avatar asked Jul 08 '14 12:07

vivek ashodha


2 Answers

As Thomas wrote, you are missing the space. You are also passing variable mapred.mapper.mystring in your CLI, but in the code you are trying to get mapred.mapper.regex. If you want to use -D parameter, you should be using Tool interface. More about it is here - Hadoop: Implementing the Tool interface for MapReduce driver.

Or you can parse your CLI arguments like this:

@Override
public int run(String[] args) throws Exception {
Configuration conf = this.getConf();

String[] otherArgs = new GenericOptionsParser(conf,args).getRemainingArgs();
while (i<otherArgs.length) {
        if (otherArgs[i].equals("-x")) {
            //Save your CLI argument
            yourVariable = otherArgs[++i];
}
//then save yourVariable into conf for using in map phase

Than your command can be like this:

$HADOOP_HOME/bin/hadoop jar /home/hduser/Hadoop_learning_path/toolgrep.jar /home/hduser/hadoopData/inputdir/ /home/hduser/hadoopData/grepoutput -x yourVariable

Hope it helps

like image 98
Radek Tomšej Avatar answered Oct 13 '22 15:10

Radek Tomšej


To use -D option with hadoop jar command correctly, given below syntax should be used:

hadoop jar {hadoop-jar-file-path} {job-main-class} -D {generic options} {input-directory} {output-directory}

Hence -D option should be placed after job main class name i.e at third position. Because when we issue hadoop jar command then, hadoop scripts invokes RunJar class main(). This main () parses first argument to set Job Jar file in classpath and uses second argument to invoke job class main().

Once Job class main () is called then control is transferred to GenericOptionsParser which first parses generic command line arguments (if any) and sets them in Job's configuration object and then calls Job class' run () with remaining arguments (i.e input and output path)

like image 1
Hitesh Avatar answered Oct 13 '22 13:10

Hitesh