Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pass Hadoop arguments into Java code

Tags:

java

jar

hadoop

I have an Uber jar that performs some Cascading ETL tasks. The jar is executed like this:

hadoop jar munge-data.jar

I'd like to pass arguments to the jar when the job is launched, e.g.

hadoop jar munge-data.jar -Denv=prod

Different credentials, hostnames, etc... will be read from properties files depending on the environment.

This would work if the job were executed java jar munge-data.jar -Denv=prod, since the env property could be accessed:

System.getProperty("env")

However, this doesn't work when the jar is executed hadoop jar ....

I saw a similar thread where the answerer states that properties can be accessed using what looks like the org.apache.hadoop.conf.Configuration class. It wasn't clear to me, from the answer, how the conf object gets created. I tried the following and it returned null:

Configuration configuration = new Configuration();
System.out.println(configuration.get("env"));

Presumably, the configuration properties need to be read/set.

Can you tell me how I can pass properties, e.g. hadoop jar [...] -DsomeProperty=someValue, into my ETL job?

like image 223
Alex Woolford Avatar asked Oct 09 '15 22:10

Alex Woolford


1 Answers

You can pass the arguments in two ways. Either using -D option or using configuration. But you can only use -D option when you implement Tool interface. If not then you have to set the configuration variables by conf.set.

Passing parameters using -D:

hadoop jar example.jar com.example.driver -D property=value /input/path /output/path

Passing parameters using Configuration:

Configuration conf=new Configuration();
conf.set("property","value");
Job job=new Job(conf);

Note: All the configuration variables have to be set before initializing Job class

like image 64
Vignesh I Avatar answered Sep 29 '22 21:09

Vignesh I