Calling a mapreduce job from a simple java program

Tags:

I have been trying to call a mapreduce job from a simple java program in the same package.. I tried to refer the mapreduce jar file in my java program and call it using the runJar(String args[]) method by also passing the input and output paths for the mapreduce job.. But the program dint work..

How do I run such a program where I just use pass input, output and jar path to its main method?? Is it possible to run a mapreduce job (jar) through it?? I want to do this because I want to run several mapreduce jobs one after another where my java program vl call each such job by referring its jar file.. If this gets possible, I might as well just use a simple servlet to do such calling and refer its output files for the graph purpose..

/*  * To change this template, choose Tools | Templates  * and open the template in the editor.  */  /**  *  * @author root  */ import org.apache.hadoop.util.RunJar; import java.util.*;  public class callOther {      public static void main(String args[])throws Throwable     {          ArrayList arg=new ArrayList();          String output="/root/Desktp/output";          arg.add("/root/NetBeansProjects/wordTool/dist/wordTool.jar");          arg.add("/root/Desktop/input");         arg.add(output);          RunJar.main((String[])arg.toArray(new String[0]));      } }

863

asked Mar 24 '12 06:03

Ravi Trivedi

2 Answers

Oh please don't do it with runJar, the Java API is very good.

See how you can start a job from normal code:

// create a configuration Configuration conf = new Configuration(); // create a new job based on the configuration Job job = new Job(conf); // here you have to put your mapper class job.setMapperClass(Mapper.class); // here you have to put your reducer class job.setReducerClass(Reducer.class); // here you have to set the jar which is containing your  // map/reduce class, so you can use the mapper class job.setJarByClass(Mapper.class); // key/value of your reducer output job.setOutputKeyClass(Text.class); job.setOutputValueClass(Text.class); // this is setting the format of your input, can be TextInputFormat job.setInputFormatClass(SequenceFileInputFormat.class); // same with output job.setOutputFormatClass(TextOutputFormat.class); // here you can set the path of your input SequenceFileInputFormat.addInputPath(job, new Path("files/toMap/")); // this deletes possible output paths to prevent job failures FileSystem fs = FileSystem.get(conf); Path out = new Path("files/out/processed/"); fs.delete(out, true); // finally set the empty out path TextOutputFormat.setOutputPath(job, out);  // this waits until the job completes and prints debug out to STDOUT or whatever // has been configured in your log4j properties. job.waitForCompletion(true);

If you are using an external cluster, you have to put the following infos to your configuration via:

// this should be like defined in your mapred-site.xml conf.set("mapred.job.tracker", "jobtracker.com:50001");  // like defined in hdfs-site.xml conf.set("fs.default.name", "hdfs://namenode.com:9000");

This should be no problem when the hadoop-core.jar is in your application containers classpath. But I think you should put some kind of progress indicator to your web page, because it may take minutes to hours to complete a hadoop job ;)

For YARN (> Hadoop 2)

For YARN, the following configurations need to be set.

// this should be like defined in your yarn-site.xml conf.set("yarn.resourcemanager.address", "yarn-manager.com:50001");   // framework is now "yarn", should be defined like this in mapred-site.xm conf.set("mapreduce.framework.name", "yarn");  // like defined in hdfs-site.xml conf.set("fs.default.name", "hdfs://namenode.com:9000");

147

answered Sep 19 '22 15:09

Thomas Jungblut

Calling MapReduce job from java web application (Servlet)

You can call a MapReduce job from web application using Java API. Here is a small example of calling a MapReduce job from servlet. The steps are given below:

Step 1: At first create a MapReduce driver servlet class. Also develop map & reduce service. Here goes a sample code snippet:

CallJobFromServlet.java

    public class CallJobFromServlet extends HttpServlet {      protected void doPost(HttpServletRequest request,HttpServletResponse response) throws ServletException, IOException {      Configuration conf = new Configuration();     // Replace CallJobFromServlet.class name with your servlet class         Job job = new Job(conf, " CallJobFromServlet.class");          job.setJarByClass(CallJobFromServlet.class);         job.setJobName("Job Name");         job.setOutputKeyClass(Text.class);         job.setOutputValueClass(Text.class);         job.setMapperClass(Map.class); // Replace Map.class name with your Mapper class         job.setNumReduceTasks(30);         job.setReducerClass(Reducer.class); //Replace Reduce.class name with your Reducer class         job.setMapOutputKeyClass(Text.class);         job.setMapOutputValueClass(Text.class);         job.setInputFormatClass(TextInputFormat.class);         job.setOutputFormatClass(TextOutputFormat.class);          // Job Input path         FileInputFormat.addInputPath(job, new           Path("hdfs://localhost:54310/user/hduser/input/"));          // Job Output path         FileOutputFormat.setOutputPath(job, new          Path("hdfs://localhost:54310/user/hduser/output"));           job.waitForCompletion(true);    } }

Step 2: Place all the related jar (hadoop, application specific jars) files inside lib folder of the web server (e.g. Tomcat). This is mandatory for accessing the Hadoop configurations ( hadoop ‘conf’ folder has configuration xml files i.e. core-site.xml , hdfs-site.xml etc ) . Just copy the jars from hadoop lib folder to web server(tomcat) lib directory. The list of jar names are as follows:

1.  commons-beanutils-1.7.0.jar 2.  commons-beanutils-core-1.8.0.jar 3.  commons-cli-1.2.jar 4.  commons-collections-3.2.1.jar 5.  commons-configuration-1.6.jar 6.  commons-httpclient-3.0.1.jar 7.  commons-io-2.1.jar 8.  commons-lang-2.4.jar 9.  commons-logging-1.1.1.jar 10. hadoop-client-1.0.4.jar 11. hadoop-core-1.0.4.jar 12. jackson-core-asl-1.8.8.jar 13. jackson-mapper-asl-1.8.8.jar 14. jersey-core-1.8.jar

Step 3: Deploy your web application into web server (in ’webapps’ folder for Tomcat).

Step 4: Create a jsp file and link the servlet class (CallJobFromServlet.java) in form action attribute. Here goes a sample code snippet:

Index.jsp

<form id="trigger_hadoop" name="trigger_hadoop" action="./CallJobFromServlet ">       <span class="back">Trigger Hadoop Job from Web Page </span>        <input type="submit" name="submit" value="Trigger Job" />       </form>

answered Sep 21 '22 15:09

RS Software -Competency Team

Related questions
                            
                                pandas - get most recent value of a particular column indexed by another column (get maximum value of a particular column indexed by another column)
                            
                                Best way to implement sort asc or desc in rails
                            
                                How come INC instruction of x86 is not atomic? [duplicate]
                            
                                Replace a fragment programmatically
                            
                                Using superclass to initialise a subclass object java [duplicate]
                            
                                How to arrange many <div> elements side by side with no wrap [duplicate]
                            
                                Threejs: assign different colors to each vertex in a geometry
                            
                                Java threads and garbage collector [duplicate]
                            
                                Pass parameters from bootstrapper to msi bundle package
                            
                                Is there any way to compile additional code at runtime in C or C++?
                            
                                Getting ServiceStack to retain type information
                            
                                Add event handler to an element that not yet exists using on()?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With