Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to run a map reduce job using Java -jar command

I write a Map reduce Job using Java. Set configuration

                    Configuration configuration = new Configuration();

        configuration.set("fs.defaultFS", "hdfs://127.0.0.1:9000");
        configuration.set("mapreduce.job.tracker", "localhost:54311");

        configuration.set("mapreduce.framework.name", "yarn");
        configuration.set("yarn.resourcemanager.address", "localhost:8032");

Run using Different Case

case 1: "Using Hadoop and Yarn command" : Success Fine Work

case 2: "Using Eclipse " : Success Fine Work

case 3: "Using Java -jar after remove all configuration.set() " :

                    Configuration configuration = new Configuration();

        Run successful but not display Job status on Yarn (default port number 8088)

case 4: "Using Java -jar" : Error

     Find stack trace:Exception in thread "main" java.io.IOException: Cannot initialize Cluster. Please check your configuration for mapreduce.framework.name and the correspond server addresses.
at org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:120)
at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:82)
at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:75)
at org.apache.hadoop.mapreduce.Job$9.run(Job.java:1255)
at org.apache.hadoop.mapreduce.Job$9.run(Job.java:1251)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1556)
at org.apache.hadoop.mapreduce.Job.connect(Job.java:1250)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1279)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1303)
at com.my.cache.run.MyTool.run(MyTool.java:38)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at com.my.main.Main.main(Main.java:45)

I request to you please tell me how to run a map-reduce job using "Java -jar" command and also able to check status and log on Yarn (default port 8088).

Why need: want to create a web service and submit a map-reduce job.(Without using Java runtime library for executing Yarn or Hadoop command ).

like image 970
Tinku Avatar asked Nov 10 '22 03:11

Tinku


1 Answers

In my opinion, it's quite difficult to run hadoop application without hadoop command. You better use hadoop jar than java -jar.

I think you don't have hadoop environment in your machine. First, you must make sure hadoop running well on your machine.

Personally, I do prefer set configuration at mapred-site.xml, core-site.xml, yarn-site.xml, hdfs-site.xml. I know a clear tutorial to install hadoop cluster in here

At this step, You can monitor hdfs in port 50070, yarn cluster in port 8088, mapreduce job history in port 19888.

Then, you should prove your hdfs environtment and yarn environtment running well. For hdfs environment you can try with simple hdfs command like mkdir, copyToLocal, copyFromLocal, etc and for yarn environment you can try sample wordcount project.

After you have hadoop environment, you can create your own mapreduce application (you can use any IDE). probably you need this for tutorial. compile it and make it in jar.

open your terminal, and run this command

hadoop jar <path to jar> <arg1> <arg2> ... <arg n>

hope this helpfull.

like image 120
Whilda Chaq Avatar answered Nov 15 '22 07:11

Whilda Chaq