Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Hadoop GenericOptionsParser

I'm running the classic hadoop word count program and couldn't really figure out how GenericOptionsParser works in the following scenario.

String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();

Command to run the word count program:

hadoop jar /home/hduser/WordCount/wordcount.jar WordCount input output

From the above command, GenericOptionsParser picks up input as otherArgs[0] and output as otherArgs[1]. Why doesn't it pick up WordCount as an argument? How does it exactly work??

I've looked at the GenericOptionsParser source code from hadoop utils but couldn't make much sense of it. Any guidance would be really helpful...

like image 743
The_Tourist Avatar asked Sep 18 '25 23:09

The_Tourist


1 Answers

If the jar you are using here(wordcount.jar) is hadoop-examples*.jar, then it is a runnable jar having main class org.apache.hadoop.examples.ExampleDriver

First argument is filtered out, if the example name (wordcount,teragen,terasort) which we specify is a valid option( teragen,terasort,wordcount etc.).

See the following method

org.apache.hadoop.util.ProgramDriver#driver(String[] args) 

After the initial filtering example class org.apache.hadoop.examples.WordCount will be invoked with the remaining argument(input output). org.apache.hadoop.examples.WordCount is not getting called directly.

The usage of GenericOptionsParser enables to specify Generic option in the command line itself

Eg: With Genericoption you can specify the following

hadoop jar /home/hduser/WordCount/wordcount.jar WordCount -Dmapred.reduce.tasks=20 input output
like image 104
SachinJ Avatar answered Sep 20 '25 13:09

SachinJ