Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Calling a mapreduce job from a simple java program

Tags:

I have been trying to call a mapreduce job from a simple java program in the same package.. I tried to refer the mapreduce jar file in my java program and call it using the runJar(String args[]) method by also passing the input and output paths for the mapreduce job.. But the program dint work..


How do I run such a program where I just use pass input, output and jar path to its main method?? Is it possible to run a mapreduce job (jar) through it?? I want to do this because I want to run several mapreduce jobs one after another where my java program vl call each such job by referring its jar file.. If this gets possible, I might as well just use a simple servlet to do such calling and refer its output files for the graph purpose..


/*  * To change this template, choose Tools | Templates  * and open the template in the editor.  */  /**  *  * @author root  */ import org.apache.hadoop.util.RunJar; import java.util.*;  public class callOther {      public static void main(String args[])throws Throwable     {          ArrayList arg=new ArrayList();          String output="/root/Desktp/output";          arg.add("/root/NetBeansProjects/wordTool/dist/wordTool.jar");          arg.add("/root/Desktop/input");         arg.add(output);          RunJar.main((String[])arg.toArray(new String[0]));      } } 
like image 863
Ravi Trivedi Avatar asked Mar 24 '12 06:03

Ravi Trivedi


People also ask

How do I run a MapReduce job?

Resources needed to run the job are copied – it includes the job JAR file, and the computed input splits, to the shared filesystem in a directory named after the job ID and the configuration file. It copies job JAR with a high replication factor, which is controlled by mapreduce. client. submit.

Can MapReduce program be written in any language other than java?

Is it possible to write MapReduce jobs in languages other than Java? Hadoop streaming is the utility that enables us to create or run MapReduce scripts in any language either, java or non-java, as mapper/reducer.


2 Answers

Oh please don't do it with runJar, the Java API is very good.

See how you can start a job from normal code:

// create a configuration Configuration conf = new Configuration(); // create a new job based on the configuration Job job = new Job(conf); // here you have to put your mapper class job.setMapperClass(Mapper.class); // here you have to put your reducer class job.setReducerClass(Reducer.class); // here you have to set the jar which is containing your  // map/reduce class, so you can use the mapper class job.setJarByClass(Mapper.class); // key/value of your reducer output job.setOutputKeyClass(Text.class); job.setOutputValueClass(Text.class); // this is setting the format of your input, can be TextInputFormat job.setInputFormatClass(SequenceFileInputFormat.class); // same with output job.setOutputFormatClass(TextOutputFormat.class); // here you can set the path of your input SequenceFileInputFormat.addInputPath(job, new Path("files/toMap/")); // this deletes possible output paths to prevent job failures FileSystem fs = FileSystem.get(conf); Path out = new Path("files/out/processed/"); fs.delete(out, true); // finally set the empty out path TextOutputFormat.setOutputPath(job, out);  // this waits until the job completes and prints debug out to STDOUT or whatever // has been configured in your log4j properties. job.waitForCompletion(true); 

If you are using an external cluster, you have to put the following infos to your configuration via:

// this should be like defined in your mapred-site.xml conf.set("mapred.job.tracker", "jobtracker.com:50001");  // like defined in hdfs-site.xml conf.set("fs.default.name", "hdfs://namenode.com:9000"); 

This should be no problem when the hadoop-core.jar is in your application containers classpath. But I think you should put some kind of progress indicator to your web page, because it may take minutes to hours to complete a hadoop job ;)

For YARN (> Hadoop 2)

For YARN, the following configurations need to be set.

// this should be like defined in your yarn-site.xml conf.set("yarn.resourcemanager.address", "yarn-manager.com:50001");   // framework is now "yarn", should be defined like this in mapred-site.xm conf.set("mapreduce.framework.name", "yarn");  // like defined in hdfs-site.xml conf.set("fs.default.name", "hdfs://namenode.com:9000"); 
like image 147
Thomas Jungblut Avatar answered Sep 19 '22 15:09

Thomas Jungblut


Calling MapReduce job from java web application (Servlet)

You can call a MapReduce job from web application using Java API. Here is a small example of calling a MapReduce job from servlet. The steps are given below:

Step 1: At first create a MapReduce driver servlet class. Also develop map & reduce service. Here goes a sample code snippet:

CallJobFromServlet.java

    public class CallJobFromServlet extends HttpServlet {      protected void doPost(HttpServletRequest request,HttpServletResponse response) throws ServletException, IOException {      Configuration conf = new Configuration();     // Replace CallJobFromServlet.class name with your servlet class         Job job = new Job(conf, " CallJobFromServlet.class");          job.setJarByClass(CallJobFromServlet.class);         job.setJobName("Job Name");         job.setOutputKeyClass(Text.class);         job.setOutputValueClass(Text.class);         job.setMapperClass(Map.class); // Replace Map.class name with your Mapper class         job.setNumReduceTasks(30);         job.setReducerClass(Reducer.class); //Replace Reduce.class name with your Reducer class         job.setMapOutputKeyClass(Text.class);         job.setMapOutputValueClass(Text.class);         job.setInputFormatClass(TextInputFormat.class);         job.setOutputFormatClass(TextOutputFormat.class);          // Job Input path         FileInputFormat.addInputPath(job, new           Path("hdfs://localhost:54310/user/hduser/input/"));          // Job Output path         FileOutputFormat.setOutputPath(job, new          Path("hdfs://localhost:54310/user/hduser/output"));           job.waitForCompletion(true);    } } 

Step 2: Place all the related jar (hadoop, application specific jars) files inside lib folder of the web server (e.g. Tomcat). This is mandatory for accessing the Hadoop configurations ( hadoop ‘conf’ folder has configuration xml files i.e. core-site.xml , hdfs-site.xml etc ) . Just copy the jars from hadoop lib folder to web server(tomcat) lib directory. The list of jar names are as follows:

1.  commons-beanutils-1.7.0.jar 2.  commons-beanutils-core-1.8.0.jar 3.  commons-cli-1.2.jar 4.  commons-collections-3.2.1.jar 5.  commons-configuration-1.6.jar 6.  commons-httpclient-3.0.1.jar 7.  commons-io-2.1.jar 8.  commons-lang-2.4.jar 9.  commons-logging-1.1.1.jar 10. hadoop-client-1.0.4.jar 11. hadoop-core-1.0.4.jar 12. jackson-core-asl-1.8.8.jar 13. jackson-mapper-asl-1.8.8.jar 14. jersey-core-1.8.jar 

Step 3: Deploy your web application into web server (in ’webapps’ folder for Tomcat).

Step 4: Create a jsp file and link the servlet class (CallJobFromServlet.java) in form action attribute. Here goes a sample code snippet:

Index.jsp

<form id="trigger_hadoop" name="trigger_hadoop" action="./CallJobFromServlet ">       <span class="back">Trigger Hadoop Job from Web Page </span>        <input type="submit" name="submit" value="Trigger Job" />       </form> 
like image 37
RS Software -Competency Team Avatar answered Sep 21 '22 15:09

RS Software -Competency Team