I'm running a parsing job in hadoop, the source is a 11GB map file with about 900,000 binary records each representing an HTML file, the map extract links and write them to the context. I have no reducer written for this job.
I'm getting the following error:
org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in shuffle in fetcher#1 at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:124) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:362) at org.apache.hadoop.mapred.Child$4.run(Child.java:223) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1153) at org.apache.hadoop.mapred.Child.main(Child.java:217) Caused by: java.lang.OutOfMemoryError: Java heap space at org.apache.hadoop.io.BoundedByteArrayOutputStream.(BoundedByteArrayOutputStream.java:58) at org.apache.hadoop.io.BoundedByteArrayOutputStream.(BoundedByteArrayOutputStream.java:45) at org.apache.hadoop.mapreduce.task.reduce.MapOutput.(MapOutput.java:104) at org.apache.hadoop.mapreduce.task.reduce.MergeManager.unconditionalReserve(MergeManager.java:267)
This is my mapreduce-site.xml:
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>Hadp01:8012</value>
<description>The host and port that the MapReduce job tracker runs
at. If "local", then jobs are run in-process as a single map
and reduce task.
</description>
</property>
<property>
<name>mapred.local.dir</name>
<value>/BigData1/MapReduce,/BigData2/MapReduce</value>
</property>
<property>
<name>mapred.child.java.opts</name>
<value>-Xmx1536m</value>
</property>
<property>
<name>dfs.datanode.max.xcievers</name>
<value>2048</value>
</property>
<property>
<name>mapreduce.task.io.sort.mb</name>
<value>300</value>
</property>
<property>
<name>io.sort.mb</name>
<value>300</value>
</property>
<property>
<name>mapreduce.task.io.sort.factor</name>
<value>100</value>
</property>
<property>
<name>io.sort.factor</name>
<value>100</value>
</property>
<property>
<name>tasktracker.http.threads</name>
<value>80</value>
</property>
</configuration>
Anyone has any idea how to fix it? Thank you!
this error caused by mapreduce.reduce.shuffle.memory.limit.percent,by default
mapreduce.reduce.shuffle.memory.limit.percent=0.25
To resolve this problem, I restrict my reduce's shuffle memory usage: hive:
set mapreduce.reduce.shuffle.memory.limit.percent=0.15;
MapReduce:
job.getConfiguration().setStrings("mapreduce.reduce.shuffle.memory.limit.percent", "0.15");
shuffle error solution
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With