I'm running the classic hadoop word count program and couldn't really figure out how GenericOptionsParser works in the following scenario.
String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();
Command to run the word count program:
hadoop jar /home/hduser/WordCount/wordcount.jar WordCount input output
From the above command, GenericOptionsParser picks up input as otherArgs[0] and output as otherArgs[1]. Why doesn't it pick up WordCount as an argument? How does it exactly work??
I've looked at the GenericOptionsParser source code from hadoop utils but couldn't make much sense of it. Any guidance would be really helpful...
If the jar you are using here(wordcount.jar) is hadoop-examples*.jar, then it is a runnable jar having main class org.apache.hadoop.examples.ExampleDriver
First argument is filtered out, if the example name (wordcount,teragen,terasort) which we specify is a valid option( teragen,terasort,wordcount etc.).
See the following method
org.apache.hadoop.util.ProgramDriver#driver(String[] args)
After the initial filtering example class org.apache.hadoop.examples.WordCount
will be invoked with the remaining argument(input output). org.apache.hadoop.examples.WordCount is not getting called directly.
The usage of GenericOptionsParser enables to specify Generic option in the command line itself
Eg: With Genericoption you can specify the following
hadoop jar /home/hduser/WordCount/wordcount.jar WordCount -Dmapred.reduce.tasks=20 input output
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With