Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Hadoop reduce stops running

I have a terrible question now. When I run a job in hadoop, the map process was ok which reached 100% with nothing fault happened. However, when reduce process was running, it stoped while it reached 67%. It's very strange. I'm new to hadoop and have searched many materials online but it still puzzled me now. Follow is a piece of output.

13/10/25 21:40:00 INFO input.FileInputFormat: Total input paths to process : 2
13/10/25 21:40:01 INFO mapred.JobClient: Running job: job_201310252001_0003
13/10/25 21:40:02 INFO mapred.JobClient:  map 0% reduce 0%
13/10/25 21:40:30 INFO mapred.JobClient:  map 1% reduce 0%
13/10/25 21:40:37 INFO mapred.JobClient:  map 2% reduce 0%
13/10/25 21:40:39 INFO mapred.JobClient:  map 3% reduce 0%
13/10/25 21:40:40 INFO mapred.JobClient:  map 4% reduce 0%
13/10/25 21:40:42 INFO mapred.JobClient:  map 5% reduce 0%
13/10/25 21:40:43 INFO mapred.JobClient:  map 6% reduce 0%
13/10/25 21:40:45 INFO mapred.JobClient:  map 7% reduce 0%
13/10/25 21:40:46 INFO mapred.JobClient:  map 9% reduce 0%
13/10/25 21:40:48 INFO mapred.JobClient:  map 10% reduce 0%
13/10/25 21:40:49 INFO mapred.JobClient:  map 11% reduce 0%
13/10/25 21:40:52 INFO mapred.JobClient:  map 14% reduce 0%
13/10/25 21:40:55 INFO mapred.JobClient:  map 17% reduce 0%
13/10/25 21:40:58 INFO mapred.JobClient:  map 19% reduce 0%
13/10/25 21:41:01 INFO mapred.JobClient:  map 22% reduce 0%
13/10/25 21:41:04 INFO mapred.JobClient:  map 23% reduce 0%
13/10/25 21:41:05 INFO mapred.JobClient:  map 24% reduce 0%
13/10/25 21:41:07 INFO mapred.JobClient:  map 26% reduce 0%
13/10/25 21:41:08 INFO mapred.JobClient:  map 27% reduce 0%
13/10/25 21:41:10 INFO mapred.JobClient:  map 28% reduce 0%
13/10/25 21:41:11 INFO mapred.JobClient:  map 29% reduce 0%
13/10/25 21:41:13 INFO mapred.JobClient:  map 30% reduce 0%
13/10/25 21:41:14 INFO mapred.JobClient:  map 31% reduce 0%
13/10/25 21:41:16 INFO mapred.JobClient:  map 32% reduce 0%
13/10/25 21:41:20 INFO mapred.JobClient:  map 34% reduce 0%
13/10/25 21:41:23 INFO mapred.JobClient:  map 35% reduce 0%
13/10/25 21:41:26 INFO mapred.JobClient:  map 36% reduce 0%
13/10/25 21:41:34 INFO mapred.JobClient:  map 37% reduce 0%
13/10/25 21:41:39 INFO mapred.JobClient:  map 38% reduce 0%
13/10/25 21:41:43 INFO mapred.JobClient:  map 40% reduce 0%
13/10/25 21:41:44 INFO mapred.JobClient:  map 40% reduce 6%
13/10/25 21:41:46 INFO mapred.JobClient:  map 42% reduce 6%
13/10/25 21:41:49 INFO mapred.JobClient:  map 43% reduce 6%
13/10/25 21:41:51 INFO mapred.JobClient:  map 44% reduce 6%
13/10/25 21:41:52 INFO mapred.JobClient:  map 45% reduce 6%
13/10/25 21:41:55 INFO mapred.JobClient:  map 46% reduce 6%
13/10/25 21:41:57 INFO mapred.JobClient:  map 47% reduce 6%
13/10/25 21:41:58 INFO mapred.JobClient:  map 48% reduce 9%
13/10/25 21:42:01 INFO mapred.JobClient:  map 51% reduce 12%
13/10/25 21:42:04 INFO mapred.JobClient:  map 54% reduce 12%
13/10/25 21:42:07 INFO mapred.JobClient:  map 56% reduce 12%
13/10/25 21:42:10 INFO mapred.JobClient:  map 58% reduce 12%
13/10/25 21:42:13 INFO mapred.JobClient:  map 60% reduce 12%
13/10/25 21:42:16 INFO mapred.JobClient:  map 61% reduce 12%
13/10/25 21:42:19 INFO mapred.JobClient:  map 62% reduce 15%
13/10/25 21:42:22 INFO mapred.JobClient:  map 63% reduce 15%
13/10/25 21:42:23 INFO mapred.JobClient:  map 65% reduce 15%
13/10/25 21:42:26 INFO mapred.JobClient:  map 66% reduce 15%
13/10/25 21:42:28 INFO mapred.JobClient:  map 67% reduce 15%
13/10/25 21:42:29 INFO mapred.JobClient:  map 68% reduce 15%
13/10/25 21:42:32 INFO mapred.JobClient:  map 69% reduce 15%
13/10/25 21:42:34 INFO mapred.JobClient:  map 70% reduce 18%
13/10/25 21:42:35 INFO mapred.JobClient:  map 72% reduce 18%
13/10/25 21:42:38 INFO mapred.JobClient:  map 75% reduce 18%
13/10/25 21:42:41 INFO mapred.JobClient:  map 77% reduce 18%
13/10/25 21:42:44 INFO mapred.JobClient:  map 80% reduce 18%
13/10/25 21:42:47 INFO mapred.JobClient:  map 82% reduce 18%
13/10/25 21:42:50 INFO mapred.JobClient:  map 85% reduce 18%
13/10/25 21:42:53 INFO mapred.JobClient:  map 87% reduce 18%
13/10/25 21:42:56 INFO mapred.JobClient:  map 88% reduce 18%
13/10/25 21:42:59 INFO mapred.JobClient:  map 89% reduce 18%
13/10/25 21:43:02 INFO mapred.JobClient:  map 90% reduce 18%
13/10/25 21:43:05 INFO mapred.JobClient:  map 91% reduce 18%
13/10/25 21:43:18 INFO mapred.JobClient:  map 94% reduce 21%
13/10/25 21:43:21 INFO mapred.JobClient:  map 97% reduce 21%
13/10/25 21:43:24 INFO mapred.JobClient:  map 99% reduce 27%
13/10/25 21:43:27 INFO mapred.JobClient:  map 100% reduce 30%
13/10/25 21:43:30 INFO mapred.JobClient:  map 100% reduce 67%
like image 280
smalliao Avatar asked Oct 25 '13 14:10

smalliao


People also ask

How many times does the reducer method run?

A reducer is called only one time except if the speculative execution is activated.

What does reducer do in Hadoop?

The Reducer copies the sorted output from each Mapper using HTTP across the network. The framework merge sorts Reducer inputs by key s (since different Mapper s may have output the same key). The shuffle and sort phases occur simultaneously i.e. while outputs are being fetched they are merged.

What happen if number of reducer is0 in Hadoop?

If we set the number of Reducer to 0 (by setting job. setNumreduceTasks(0)), then no reducer will execute and no aggregation will take place. In such case, we will prefer “Map-only job” in Hadoop. In Map-Only job, the map does all task with its InputSplit and the reducer do no job.


1 Answers

The symptom here is that your code in your reduce phase is "stuck", either because of an infinite loop or just a ludicrous amount of data received, or something else (maybe post your reduce code?).

Here are the way that percentages work in the reducer:

  1. 0-33% is the shuffle. This is data moving from the mappers to the reducers (see how it starts before the mappers are finished).
  2. 33%-67% is the sort. This can only start when the mappers are finished (see how it goes from 30% to 67% after map is at 100%).
  3. 67%-100% is the actual reduce code you are running. This percentage goes up every time a reduce task completes. None of your reduce tasks are completing.

In the JobTracker interface, look at your job and see how much data the reducers are getting in. If the number of records in the reducer is going up, that means you probably have too much data going to the reducers. If that number stays still, you might have an infinite loop of some sort.

like image 120
Donald Miner Avatar answered Sep 28 '22 01:09

Donald Miner