Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Hadoop ClassNotFoundException related to MapClass

I see many questions related to ClassNotFoundExceptions, "No job jar file set", and Hadoop. Most of them point towards a lack of the setJarByClass method (either using JobConf or Job) in the configuration. I'm a bit puzzled with the exception I'm hitting because I've got that set. Here is everything that I think relevant (please let me know if I've omitted anything):

 echo $CLASS_PATH
/root/javajars/mysql-connector-java-5.1.22/mysql-connector-java-5.1.22-bin.jar:/usr/lib/hadoop-0.20/hadoop-core-0.20.2-cdh3u5.jar:.

Code (mostly omitted)

import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.util.ToolRunner;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.GenericOptionsParser;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.io.IntWritable;

import java.io.IOException;
import java.util.Iterator;
import java.lang.System;
import java.net.URL;

import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.SQLException;
import java.sql.Statement;
import java.sql.ResultSet;

public class QueryTable extends Configured implements Tool {

    public static class MapClass extends Mapper<Object, Text, Text, IntWritable>{

    public void map(Object key, Text value, Context context)
            throws IOException, InterruptedException {
            ...
        }
    }

    public static class Reduce extends Reducer<Text, IntWritable, Text, IntWritable>{
        private IntWritable result = new IntWritable();

        public void reduce (Text key, Iterable<IntWritable> values,
                            Context context) throws IOException, InterruptedException {
            ...
        }
    }

    public int run(String[] args) throws Exception {
         //Configuration conf = getConf();                                                                                                                                                                                                                                       
        Configuration conf = new Configuration();

        Job job = new Job(conf, "QueryTable");
        job.setJarByClass(QueryTable.class);

        Path in =  new Path(args[0]);
        Path out = new Path(args[1]);
        FileInputFormat.setInputPaths(job, in);
        //FileInputFormat.addInputPath(job, in);                                                                                                                                                                                                                                
        FileOutputFormat.setOutputPath(job, out);

        job.setMapperClass(MapClass.class);
        job.setCombinerClass(Reduce.class); // new                                                                                                                                                                                                                              
        job.setReducerClass(Reduce.class);

        job.setInputFormatClass(TextInputFormat.class);
        job.setOutputFormatClass(TextOutputFormat.class);
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(Text.class);

        System.exit(job.waitForCompletion(true)?0:1);
        return 0;
    }

    public static void main(String[] args) throws Exception {
        int res = ToolRunner.run(new Configuration(), new QueryTable(), args);
        System.exit(res);
    }
}

I then compile, create the jar, and run:

javac QueryTable.java -d QueryTable
jar -cvf QueryTable.jar -C QueryTable/ .
hadoop jar QueryTable.jar QueryTable input output

Here is the exception:

13/01/14 17:09:30 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
**13/01/14 17:09:30 WARN mapred.JobClient: No job jar file set.  User classes may not be found. See JobConf(Class) or JobConf#setJar(String).**
13/01/14 17:09:30 INFO input.FileInputFormat: Total input paths to process : 1
13/01/14 17:09:30 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
13/01/14 17:09:30 WARN snappy.LoadSnappy: Snappy native library not loaded
13/01/14 17:09:31 INFO mapred.JobClient: Running job: job_201301081120_0045
13/01/14 17:09:33 INFO mapred.JobClient:  map 0% reduce 0%
    13/01/14 17:09:39 INFO mapred.JobClient: Task Id : attempt_201301081120_0045_m_000000_0, Status : FAILED
java.lang.RuntimeException: java.lang.ClassNotFoundException: QueryTable$MapClass
    at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1004)
    at org.apache.hadoop.mapreduce.JobContext.getMapperClass(JobContext.java:217)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:602)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:323)
    at org.apache.hadoop.mapred.Child$4.run(Child.java:266)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1278)
    at org.apache.hadoop.mapred.Child.main(Child.java:260)
Caused by: java.lang.ClassNotFoundException: QueryTable$MapClass
    at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
    at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
    at java.lang.ClassLoader.loadCl

Sorry for that huge wall of text. I don't understand why I'm getting the warning about no job jar file set. I set it in my run method. Also, the warning is issued by JobClient, and in my code I'm using Job not JobClient. If you've got any ideas or feedback I'm very interested. Thanks for your time!

EDIT

Contents of jar:

jar -tvf QueryTable.jar
    0 Tue Jan 15 14:40:46 EST 2013 META-INF/
   68 Tue Jan 15 14:40:46 EST 2013 META-INF/MANIFEST.MF
 3091 Tue Jan 15 14:40:10 EST 2013 QueryTable.class
 3173 Tue Jan 15 14:40:10 EST 2013 QueryTable$MapClass.class
 1699 Tue Jan 15 14:40:10 EST 2013 QueryTable$Reduce.class
like image 574
cbrown Avatar asked Jan 15 '13 13:01

cbrown


1 Answers

I was able to fix the problem by declaring a package at the top of my source.

package com.foo.hadoop;

I then compiled, created the jar, and explicitly called hadoop with the package prepended to the class name.

hadoop jar QueryTable.jar com.foo.hadoop.QueryTable input output

I understand this is what most people would have done to begin with, though I'd think it'd still work without specifying a package. It's definitely better practice though and it has allowed me to proceed.

like image 120
cbrown Avatar answered Nov 18 '22 11:11

cbrown