Problem with -libjars in hadoop

2 Answers

Also worth to note subtle but important point: the way to specify additional JARs for JVMs running distributed map reduce tasks and for JVM running job client is very different.

-libjars makes Jars only available for JVMs running remote map and reduce task
To make these same JAR’s available to the client JVM (The JVM that’s created when you run the hadoop jar command) need to set HADOOP_CLASSPATH environment variable:

$ export LIBJARS=/path/jar1,/path/jar2
$ export HADOOP_CLASSPATH=/path/jar1:/path/jar2
$ hadoop jar my-example.jar com.example.MyTool -libjars ${LIBJARS} -mytoolopt value

See: http://grepalex.com/2013/02/25/hadoop-libjars/

Another cause of incorrect -libjars behaviour could be in wrong implementation and initialization of custom Job class.

Job class must implement Tool interface
Configuration class instance must be obtained by calling getConf() instead of creating new instance;

See: http://kickstarthadoop.blogspot.ca/2012/05/libjars-not-working-in-custom-mapreduce.html

183

answered Sep 26 '22 02:09

Vladimir Kroz

When you are specifying the -LIBJARS with the Hadoop jar command. First make sure that you edit your driver class as shown below:

    public class myDriverClass extends Configured implements Tool {

      public static void main(String[] args) throws Exception {
         int res = ToolRunner.run(new Configuration(), new myDriverClass(), args);
         System.exit(res);
      }

      public int run(String[] args) throws Exception
      {

        // Configuration processed by ToolRunner 
        Configuration conf = getConf();
        Job job = new Job(conf, "My Job");

        ...
        ...

        return job.waitForCompletion(true) ? 0 : 1;
    }
}

Now edit your "hadoop jar" command as shown below:

hadoop jar YourApplication.jar [myDriverClass] args -libjars path/to/jar/file

Now lets understand what happens underneath. Basically we are handling the new command line arguments by implementing the TOOL Interface. ToolRunner is used to run classes implementing Tool interface. It works in conjunction with GenericOptionsParser to parse the generic hadoop command line arguments and modifies the Configuration of the Tool.

Within our Main() we are calling ToolRunner.run(new Configuration(), new myDriverClass(), args) - this runs the given Tool by Tool.run(String[]), after parsing with the given generic arguments. It uses the given Configuration, or builds one if it's null and then sets the Tool's configuration with the possibly modified version of the conf.

Now within the run method, when we call getConf() we get the modified version of the Configuration. So make sure that you have the below line in your code. If you implement everything else and still make use of Configuration conf = new Configuration(), nothing would work.

Configuration conf = getConf();

answered Sep 24 '22 02:09

Isaiah4110

Related questions
                            
                                How do I pass a parameter to a python Hadoop streaming job?
                            
                                Hadoop Configuration on Windows through Cygwin
                            
                                namespace image and edit log
                            
                                hadoop hdfs formatting gets error failed for Block pool
                            
                                Get the last updated file in HDFS
                            
                                What is version library spark supported SparkSession
                            
                                How to recursively read Hadoop files from directory using Spark?
                            
                                What is the difference between 'InputFormat, OutputFormat' & 'Stored as' in Hive?
                            
                                How to change queue of currently running hadoop job?
                            
                                Hadoop YARN vs Yarn package manager command conflict
                            
                                What is the maximum number of files allowed in a HDFS directory?
                            
                                why Hadoop is not a real-time platform
                            
                                Hive: Sum over a specified group (HiveQL)
                            
                                Search a table in all databases in hive
                            
                                copying directory from local system to hdfs java code
                            
                                using PIG to load a file
                            
                                HDFS from Java - Specifying the User
                            
                                Mapreduce Combiner
                            
                                HBase Scan Performance
                            
                                How to copy and convert parquet files to csv

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Problem with -libjars in hadoop

Tags:

hadoop

mapreduce

Shrish Bajpai

People also ask

2 Answers

Vladimir Kroz

Isaiah4110

Recent Activity

Donate For Us