Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Run Hadoop job without using JobConf

Tags:

I can't find a single example of submitting a Hadoop job that does not use the deprecated JobConf class. JobClient, which hasn't been deprecated, still only supports methods that take a JobConf parameter.

Can someone please point me at an example of Java code submitting a Hadoop map/reduce job using only the Configuration class (not JobConf), and using the mapreduce.lib.input package instead of mapred.input?

like image 333
Greg Cottman Avatar asked Jan 22 '10 05:01

Greg Cottman


People also ask

What is JobConf in Hadoop?

JobConf is the primary interface for a user to describe a map-reduce job to the Hadoop framework for execution. The framework tries to faithfully execute the job as-is described by JobConf , however: Some configuration parameters might have been marked as final by administrators and hence cannot be altered.

When no mapper class is specified in a MapReduce job then?

If MapReduce programmer do not set the Mapper Class using JobConf. setMapperClass then IdentityMapper. class is used as a default value. if you are not mentioning the mapper even then there will be one mapper running.so in any case atleast one mapper will be running.

What happens if you try to run a Hadoop job with an output directory that is already present?

it will throw an error. HDFS writes once and read many times. So if u specify the directory already present, it will throw an error.


1 Answers

Hope this helpful

import java.io.File;  import org.apache.commons.io.FileUtils; import org.apache.hadoop.conf.Configured; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.Mapper; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; import org.apache.hadoop.util.Tool; import org.apache.hadoop.util.ToolRunner;  public class MapReduceExample extends Configured implements Tool {      static class MyMapper extends Mapper<LongWritable, Text, LongWritable, Text> {         public MyMapper(){          }          protected void map(                 LongWritable key,                 Text value,                 org.apache.hadoop.mapreduce.Mapper<LongWritable, Text, LongWritable, Text>.Context context)                 throws java.io.IOException, InterruptedException {             context.getCounter("mygroup", "jeff").increment(1);             context.write(key, value);         };     }      @Override     public int run(String[] args) throws Exception {         Job job = new Job();         job.setMapperClass(MyMapper.class);         FileInputFormat.setInputPaths(job, new Path(args[0]));         FileOutputFormat.setOutputPath(job, new Path(args[1]));          job.waitForCompletion(true);         return 0;     }      public static void main(String[] args) throws Exception {         FileUtils.deleteDirectory(new File("data/output"));         args = new String[] { "data/input", "data/output" };         ToolRunner.run(new MapReduceExample(), args);     } } 
like image 187
zjffdu Avatar answered Oct 12 '22 23:10

zjffdu