Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Hadoop WordCount example stuck at map 100% reduce 0%

[hadoop-1.0.2] → hadoop jar hadoop-examples-1.0.2.jar wordcount /user/abhinav/input     /user/abhinav/output
Warning: $HADOOP_HOME is deprecated.

****hdfs://localhost:54310/user/abhinav/input
12/04/15 15:52:31 INFO input.FileInputFormat: Total input paths to process : 1
12/04/15 15:52:31 WARN util.NativeCodeLoader: Unable to load native-hadoop library for     your platform... using builtin-java classes where applicable
12/04/15 15:52:31 WARN snappy.LoadSnappy: Snappy native library not loaded
12/04/15 15:52:31 INFO mapred.JobClient: Running job: job_201204151241_0010
12/04/15 15:52:32 INFO mapred.JobClient:  map 0% reduce 0%
12/04/15 15:52:46 INFO mapred.JobClient:  map 100% reduce 0%

I've set up hadoop on a single node using this guide (http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-single-node-cluster/#run-the-mapreduce-job) and I'm trying to run a provided example but I'm getting stuck at map 100% reduce 0%. What could be causing this?

like image 884
Abhinav Sharma Avatar asked Apr 15 '12 19:04

Abhinav Sharma


1 Answers

First of all, open up your job tracker and look at the number of free reducer slots and other running jobs - is there another job running which is consuming all the free reducer slots when then become available.

Once you've proved to yourself that there are some free reducer slots available to run a reducer for you job, locate your job in the job tracker web ui and click on it to open it up. You should now be able to see the number of completed mappers - ensure this reads that you have no running mappers. The % complete in the console sometimes lies and you could have a mapper which is in the process of committing saying it's 100%, but having a problem finalizing.

Once you're satisfied that all your mappers have finished, look at the number of running reducers - does this show 0? If not does it show that some are running - click on the number of running reducers to bring up the running reducers page, now click through on an instance until you get an option to view the logs for the reducer. You'll want to view all the logs for this reducer (not the first / last 100k). This should tell you what your reducer is actually doing - most probably trying to copy the results from the mappers to the reducer node. I imagine this is where your problem is, one of network or disk space, but eitherway, eventually hadoop should fail the reducer instance out and reschedule it to run on another node.

like image 80
Chris White Avatar answered Sep 21 '22 13:09

Chris White