I have a Hadoop job in which the mapper must use an external jar.
I tried to pass this jar to the mapper's JVM
via the -libjars argument on the hadoop command
hadoop jar mrrunner.jar DAGMRRunner -libjars <path_to_jar>/colt.jar
via job.addFileToClassPath
job.addFileToClassPath(new Path("<path_to_jar>/colt.jar"));
on HADOOP_CLASSPATH.
g1mihai@hydra:/home/g1mihai/$ echo $HADOOP_CLASSPATH
<path_to_jar>/colt.jar
None of these methods work. This is the stack trace I get back. The missing class it complains about is SparseDoubleMatrix1D is in colt.jar.
Let me know if I should provide any additional debug info. Thanks.
15/02/14 16:47:51 INFO mapred.MapTask: Starting flush of map output
15/02/14 16:47:51 INFO mapred.LocalJobRunner: map task executor complete.
15/02/14 16:47:51 WARN mapred.LocalJobRunner: job_local368086771_0001
java.lang.Exception: java.lang.NoClassDefFoundError: Lcern/colt/matrix/impl/SparseDoubleMatrix1D;
at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522)
Caused by: java.lang.NoClassDefFoundError: Lcern/colt/matrix/impl/SparseDoubleMatrix1D;
at java.lang.Class.getDeclaredFields0(Native Method)
at java.lang.Class.privateGetDeclaredFields(Class.java:2499)
at java.lang.Class.getDeclaredField(Class.java:1951)
at java.io.ObjectStreamClass.getDeclaredSUID(ObjectStreamClass.java:1659)
at java.io.ObjectStreamClass.access$700(ObjectStreamClass.java:72)
at java.io.ObjectStreamClass$2.run(ObjectStreamClass.java:480)
at java.io.ObjectStreamClass$2.run(ObjectStreamClass.java:468)
at java.security.AccessController.doPrivileged(Native Method)
at java.io.ObjectStreamClass.<init>(ObjectStreamClass.java:468)
at java.io.ObjectStreamClass.lookup(ObjectStreamClass.java:365)
at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:602)
at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1622)
at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1771)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
at BoostConnector.ConnectCalculateBoost(BoostConnector.java:39)
at DAGMapReduceSearcher$Map.map(DAGMapReduceSearcher.java:46)
at DAGMapReduceSearcher$Map.map(DAGMapReduceSearcher.java:22)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:784)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.ClassNotFoundException: cern.colt.matrix.impl.SparseDoubleMatrix1D
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
... 28 more
I believe that this question deserves a detailed answer, I was stuck with this yesterday and wasted a lot of time. I hope this answer helps everyone who happen to run into this. There are couple of options to fix this issue:
Include the external jar (dependency JAR) as part of your application jar file. You can easily do this using Eclipse. The disadvantage of this option is that it will bloat up your application jar and your MapRed job will take much more time to get executed. Every time your dependency version changes you will have to recompile the application etc. It's better not to go this route.
Using "Hadoop classpath" - On the command line run the command "hadoop classpath" and then find a suitable folder and copy your jar file to that location and hadoop will pick up the dependencies from there. This wont work with CloudEra etc as you may not have read/write rights to copy files to the hadoop classpath folders.
The option that I made use of was specifying the -LIBJARS with the Hadoop jar command. First make sure that you edit your driver class:
public class myDriverClass extends Configured implements Tool {
public static void main(String[] args) throws Exception {
int res = ToolRunner.run(new Configuration(), new myDriverClass(), args);
System.exit(res);
}
public int run(String[] args) throws Exception
{
// Configuration processed by ToolRunner
Configuration conf = getConf();
Job job = new Job(conf, "My Job");
...
...
return job.waitForCompletion(true) ? 0 : 1;
}
}
Now edit your "hadoop jar" command as shown below:
hadoop jar YourApplication.jar [myDriverClass] args -libjars path/to/jar/file
Now lets understand what happens underneath. Basically we are handling the new command line arguments by implementing the TOOL Interface. ToolRunner is used to run classes implementing Tool interface. It works in conjunction with GenericOptionsParser to parse the generic hadoop command line arguments and modifies the Configuration of the Tool.
Within our Main() we are calling ToolRunner.run(new Configuration(), new myDriverClass(), args)
- this runs the given Tool by Tool.run(String[]), after parsing with the given generic arguments. It uses the given Configuration, or builds one if it's null and then sets the Tool's configuration with the possibly modified version of the conf.
Now within the run method, when we call getConf() we get the modified version of the Configuration. So make sure that you have the below line in your code. If you implement everything else and still make use of Configuration conf = new Configuration(), nothing would work.
Configuration conf = getConf();
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With