I have an Uber jar that performs some Cascading ETL tasks. The jar is executed like this:
hadoop jar munge-data.jar
I'd like to pass arguments to the jar when the job is launched, e.g.
hadoop jar munge-data.jar -Denv=prod
Different credentials, hostnames, etc... will be read from properties files depending on the environment.
This would work if the job were executed java jar munge-data.jar -Denv=prod
, since the env
property could be accessed:
System.getProperty("env")
However, this doesn't work when the jar is executed hadoop jar ...
.
I saw a similar thread where the answerer states that properties can be accessed using what looks like the org.apache.hadoop.conf.Configuration class. It wasn't clear to me, from the answer, how the conf
object gets created. I tried the following and it returned null
:
Configuration configuration = new Configuration();
System.out.println(configuration.get("env"));
Presumably, the configuration properties need to be read/set.
Can you tell me how I can pass properties, e.g. hadoop jar [...] -DsomeProperty=someValue
, into my ETL job?
You can pass the arguments in two ways. Either using -D option or using configuration. But you can only use -D option when you implement Tool interface. If not then you have to set the configuration variables by conf.set.
Passing parameters using -D:
hadoop jar example.jar com.example.driver -D property=value /input/path /output/path
Passing parameters using Configuration:
Configuration conf=new Configuration();
conf.set("property","value");
Job job=new Job(conf);
Note: All the configuration variables have to be set before initializing Job class
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With