I have been trying to call a mapreduce job from a simple java program in the same package.. I tried to refer the mapreduce jar file in my java program and call it using the runJar(String args[])
method by also passing the input and output paths for the mapreduce job.. But the program dint work..
How do I run such a program where I just use pass input, output and jar path to its main method?? Is it possible to run a mapreduce job (jar) through it?? I want to do this because I want to run several mapreduce jobs one after another where my java program vl call each such job by referring its jar file.. If this gets possible, I might as well just use a simple servlet to do such calling and refer its output files for the graph purpose..
/* * To change this template, choose Tools | Templates * and open the template in the editor. */ /** * * @author root */ import org.apache.hadoop.util.RunJar; import java.util.*; public class callOther { public static void main(String args[])throws Throwable { ArrayList arg=new ArrayList(); String output="/root/Desktp/output"; arg.add("/root/NetBeansProjects/wordTool/dist/wordTool.jar"); arg.add("/root/Desktop/input"); arg.add(output); RunJar.main((String[])arg.toArray(new String[0])); } }
Resources needed to run the job are copied – it includes the job JAR file, and the computed input splits, to the shared filesystem in a directory named after the job ID and the configuration file. It copies job JAR with a high replication factor, which is controlled by mapreduce. client. submit.
Is it possible to write MapReduce jobs in languages other than Java? Hadoop streaming is the utility that enables us to create or run MapReduce scripts in any language either, java or non-java, as mapper/reducer.
Oh please don't do it with runJar
, the Java API is very good.
See how you can start a job from normal code:
// create a configuration Configuration conf = new Configuration(); // create a new job based on the configuration Job job = new Job(conf); // here you have to put your mapper class job.setMapperClass(Mapper.class); // here you have to put your reducer class job.setReducerClass(Reducer.class); // here you have to set the jar which is containing your // map/reduce class, so you can use the mapper class job.setJarByClass(Mapper.class); // key/value of your reducer output job.setOutputKeyClass(Text.class); job.setOutputValueClass(Text.class); // this is setting the format of your input, can be TextInputFormat job.setInputFormatClass(SequenceFileInputFormat.class); // same with output job.setOutputFormatClass(TextOutputFormat.class); // here you can set the path of your input SequenceFileInputFormat.addInputPath(job, new Path("files/toMap/")); // this deletes possible output paths to prevent job failures FileSystem fs = FileSystem.get(conf); Path out = new Path("files/out/processed/"); fs.delete(out, true); // finally set the empty out path TextOutputFormat.setOutputPath(job, out); // this waits until the job completes and prints debug out to STDOUT or whatever // has been configured in your log4j properties. job.waitForCompletion(true);
If you are using an external cluster, you have to put the following infos to your configuration via:
// this should be like defined in your mapred-site.xml conf.set("mapred.job.tracker", "jobtracker.com:50001"); // like defined in hdfs-site.xml conf.set("fs.default.name", "hdfs://namenode.com:9000");
This should be no problem when the hadoop-core.jar
is in your application containers classpath. But I think you should put some kind of progress indicator to your web page, because it may take minutes to hours to complete a hadoop job ;)
For YARN (> Hadoop 2)
For YARN, the following configurations need to be set.
// this should be like defined in your yarn-site.xml conf.set("yarn.resourcemanager.address", "yarn-manager.com:50001"); // framework is now "yarn", should be defined like this in mapred-site.xm conf.set("mapreduce.framework.name", "yarn"); // like defined in hdfs-site.xml conf.set("fs.default.name", "hdfs://namenode.com:9000");
Calling MapReduce job from java web application (Servlet)
You can call a MapReduce job from web application using Java API. Here is a small example of calling a MapReduce job from servlet. The steps are given below:
Step 1: At first create a MapReduce driver servlet class. Also develop map & reduce service. Here goes a sample code snippet:
CallJobFromServlet.java
public class CallJobFromServlet extends HttpServlet { protected void doPost(HttpServletRequest request,HttpServletResponse response) throws ServletException, IOException { Configuration conf = new Configuration(); // Replace CallJobFromServlet.class name with your servlet class Job job = new Job(conf, " CallJobFromServlet.class"); job.setJarByClass(CallJobFromServlet.class); job.setJobName("Job Name"); job.setOutputKeyClass(Text.class); job.setOutputValueClass(Text.class); job.setMapperClass(Map.class); // Replace Map.class name with your Mapper class job.setNumReduceTasks(30); job.setReducerClass(Reducer.class); //Replace Reduce.class name with your Reducer class job.setMapOutputKeyClass(Text.class); job.setMapOutputValueClass(Text.class); job.setInputFormatClass(TextInputFormat.class); job.setOutputFormatClass(TextOutputFormat.class); // Job Input path FileInputFormat.addInputPath(job, new Path("hdfs://localhost:54310/user/hduser/input/")); // Job Output path FileOutputFormat.setOutputPath(job, new Path("hdfs://localhost:54310/user/hduser/output")); job.waitForCompletion(true); } }
Step 2: Place all the related jar (hadoop, application specific jars) files inside lib folder of the web server (e.g. Tomcat). This is mandatory for accessing the Hadoop configurations ( hadoop ‘conf’ folder has configuration xml files i.e. core-site.xml , hdfs-site.xml etc ) . Just copy the jars from hadoop lib folder to web server(tomcat) lib directory. The list of jar names are as follows:
1. commons-beanutils-1.7.0.jar 2. commons-beanutils-core-1.8.0.jar 3. commons-cli-1.2.jar 4. commons-collections-3.2.1.jar 5. commons-configuration-1.6.jar 6. commons-httpclient-3.0.1.jar 7. commons-io-2.1.jar 8. commons-lang-2.4.jar 9. commons-logging-1.1.1.jar 10. hadoop-client-1.0.4.jar 11. hadoop-core-1.0.4.jar 12. jackson-core-asl-1.8.8.jar 13. jackson-mapper-asl-1.8.8.jar 14. jersey-core-1.8.jar
Step 3: Deploy your web application into web server (in ’webapps’ folder for Tomcat).
Step 4: Create a jsp file and link the servlet class (CallJobFromServlet.java) in form action attribute. Here goes a sample code snippet:
Index.jsp
<form id="trigger_hadoop" name="trigger_hadoop" action="./CallJobFromServlet "> <span class="back">Trigger Hadoop Job from Web Page </span> <input type="submit" name="submit" value="Trigger Job" /> </form>
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With