Hadoop error in shuffle in fetcher#1

Question

I'm running a parsing job in hadoop, the source is a 11GB map file with about 900,000 binary records each representing an HTML file, the map extract links and write them to the context. I have no reducer written for this job.

When I run it on smaller files, of about 5GB with about 500,000 records it works ok.
This is a single machine cluser
The output has about 100 Million records, TEXT
It failed after 11 maps tasks out of 200 planned.
I'm running with Hadoop 0.22.0

I'm getting the following error:

org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in shuffle in fetcher#1 at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:124) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:362) at org.apache.hadoop.mapred.Child$4.run(Child.java:223) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1153) at org.apache.hadoop.mapred.Child.main(Child.java:217) Caused by: java.lang.OutOfMemoryError: Java heap space at org.apache.hadoop.io.BoundedByteArrayOutputStream.(BoundedByteArrayOutputStream.java:58) at org.apache.hadoop.io.BoundedByteArrayOutputStream.(BoundedByteArrayOutputStream.java:45) at org.apache.hadoop.mapreduce.task.reduce.MapOutput.(MapOutput.java:104) at org.apache.hadoop.mapreduce.task.reduce.MergeManager.unconditionalReserve(MergeManager.java:267)

This is my mapreduce-site.xml:

<configuration>
<property>
  <name>mapred.job.tracker</name>
  <value>Hadp01:8012</value>
  <description>The host and port that the MapReduce job tracker runs
  at.  If "local", then jobs are run in-process as a single map
  and reduce task.
  </description>
</property>
<property>
  <name>mapred.local.dir</name>
  <value>/BigData1/MapReduce,/BigData2/MapReduce</value>
</property>
<property>
  <name>mapred.child.java.opts</name>
  <value>-Xmx1536m</value>
</property>
<property>
        <name>dfs.datanode.max.xcievers</name>
        <value>2048</value>
</property>
<property>
    <name>mapreduce.task.io.sort.mb</name>
    <value>300</value>
</property>
<property>
    <name>io.sort.mb</name>
    <value>300</value>
</property>
<property>
    <name>mapreduce.task.io.sort.factor</name>
    <value>100</value>
</property>
<property>
    <name>io.sort.factor</name>
    <value>100</value>
</property>
<property>
    <name>tasktracker.http.threads</name>
    <value>80</value>
</property>
</configuration>

Anyone has any idea how to fix it? Thank you!

dolphinZhang · Accepted Answer

this error caused by mapreduce.reduce.shuffle.memory.limit.percent,by default

mapreduce.reduce.shuffle.memory.limit.percent=0.25

To resolve this problem, I restrict my reduce's shuffle memory usage: hive:

set mapreduce.reduce.shuffle.memory.limit.percent=0.15;

MapReduce:

job.getConfiguration().setStrings("mapreduce.reduce.shuffle.memory.limit.percent", "0.15");

shuffle error solution

Hadoop error in shuffle in fetcher#1

Tags:

hadoop

mapreduce

Noam Hasson

1 Answers

dolphinZhang

Recent Activity

Donate For Us

Hadoop error in shuffle in fetcher#1

Tags:

hadoop

mapreduce

Noam Hasson

1 Answers

dolphinZhang

Related questions

Recent Activity

Donate For Us