Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can't run a MapReduce job on hadoop 2.4.0

I am new to hadoop and here is my problem. I have configured hadoop 2.4.0 with jdk1.7.60 on cluster of 3 machine. I am able to execute all the commands of hadoop. Now I have modified wordcount example and created jar file. I have already executed with this jar file on hadoop 1.2.1 and got the result. But now on hadoop 2.4.0 I am not getting any result.

Command used for execution

$hadoop jar WordCount.jar WordCount /data/webdocs.dat /output

I am getting following message from the setup:

14/06/29 19:35:18 INFO client.RMProxy: Connecting to ResourceManager at /192.168.2.140:8040
14/06/29 19:35:18 WARN mapreduce.JobSubmitter: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
14/06/29 19:35:19 INFO input.FileInputFormat: Total input paths to process : 1
14/06/29 19:35:19 INFO mapreduce.JobSubmitter: number of splits:12
14/06/29 19:35:19 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1403905542893_0004
14/06/29 19:35:19 INFO impl.YarnClientImpl: Submitted application application_1403905542893_0004
14/06/29 19:35:19 INFO mapreduce.Job: The url to track the job: http://192.168.2.140:8088/proxy/application_1403905542893_0004/
14/06/29 19:35:19 INFO mapreduce.Job: Running job: job_1403905542893_0004

At this point no message change. I waited for 15 to 20 minutes but still the same.

This is what I see on resource manager's web page regarding the job:

State - ACCEPTED
FinalStatus - UNDEFINED
Progress - (progress bar in 0%)
Tracking UI - UNASSIGNED

Apps Submitted - 1
Apps Pending - 1
Apps Running - 0

I tried the other yarn command for execution but got the same result

$yarn jar WordCount.jar WordCount /data/webdocs.dat /output

Here is the output of jps:

21485 NameNode
23142 DataNode
28504 Jps
21704 ResourceManager
22082 JobHistoryServer

Any help or guidance will be highly appriciated.

like image 831
user2670999 Avatar asked Jun 30 '14 00:06

user2670999


People also ask

How do I run a MapReduce job in Hadoop cluster?

Configuration conf = new Configuration(); Job job = new Job(conf,"word count"); job. setJarByClass(WordCountCombinerMain. class); Path inputFilePath = new Path(args[0]); Path outputFilePath = new Path(args[1]); FileInputFormat. addInputPath(job, inputFilePath); FileOutputFormat.

How MapReduce is being performed in Hadoop?

MapReduce assigns fragments of data across the nodes in a Hadoop cluster. The goal is to split a dataset into chunks and use an algorithm to process those chunks at the same time. The parallel processing on multiple machines greatly increases the speed of handling even petabytes of data.

What do you always have to specify for a MapReduce job?

Answer:The user of MapReduce framework needs to specify the following: Job's input locations in the distributed file system. Job's output location in the distributed file system. Input format.


1 Answers

I solved the problem. It was the mistake in the configuration file of the hadoop. There was bind exception on the port 8040 for resourcemanager.

I changed the hadoop yarn-site.xml from (old yarn-site.xml):

<configuration>
<!-- Site specific YARN configuration properties -->
<property>
  <name>yarn.nodemanager.aux-services</name>
  <value>mapreduce_shuffle</value>
</property>
<property>
  <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
  <value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
  <name>yarn.resourcemanager.resource-tracker.address</name>
  <value>192.168.2.140:8025</value>
</property>
<property>
  <name>yarn.resourcemanager.scheduler.address</name>
  <value>192.168.2.140:8030</value>
</property>
<property>
  <name>yarn.resourcemanager.address</name>
  <value>192.168.2.140:8040</value>
</property>
</configuration>

To (new yarn-site.xml):

<configuration>
 <property>
  <name>yarn.nodemanager.aux-services</name>
  <value>mapreduce_shuffle</value>
 </property>
 <property>
  <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
  <value>org.apache.hadoop.mapred.ShuffleHandler</value>
 </property>
</configuration>

I deleted the other line in hadoop configuration Then I gace following commands to start resourcemanager and nodemanager

$yarn-daemon.sh start nodemanager
$yarn-daemon.sh start resourcemanager

Then I tried executing my job and it was successfull.

like image 101
user2670999 Avatar answered Oct 08 '22 23:10

user2670999