I ran a MapReduce program using the command hadoop jar <jar> [mainClass] path/to/input path/to/output
. However, my job was hanging at: INFO mapreduce.Job: map 100% reduce 29%
.
Much later, I terminated and checked the datanode log (I am running in pseudo-distributed mode). It contained the following exception:
java.io.IOException: Premature EOF from inputStream
at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:201)
at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doReadFully(PacketReceiver.java:213)
at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:134)
at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:109)
at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:472)
at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:849)
at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:804)
at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137)
at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74)
at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:251)
at java.lang.Thread.run(Thread.java:745)
5 seconds later in the log was ERROR DataXceiver error processing WRITE_BLOCK operation
.
What problem might be causing this exception and error?
My NodeHealthReport said:
1/1 local-dirs are bad: /home/$USER/hadoop/nm-local-dir;
1/1 log-dirs are bad: /home/$USER/hadoop-2.7.1/logs/userlogs
I found this which indicates that dfs.datanode.max.xcievers
may need to be increased. However, it is deprecated and the new property is called dfs.datanode.max.transfer.threads
with default value 4096. If changing this would fix my problem, what new value should I set it to?
This indicates that the ulimit
for the datanode may need to be increased. My ulimit -n
(open files) is 1024. If increasing this would fix my problem, what should I set it to?
Premature EOF can occur due to multiple reasons, one of which is spawning of huge number of threads to write to disk on one reducer node using FileOutputCommitter. MultipleOutputs class allows you to write to files with custom names and to accomplish that, it spawns one thread per file and binds a port to it to write to the disk. Now this puts a limitation on the number of files that could be written to at one reducer node. I encountered this error when the number of files crossed 12000 roughly on one reducer node, as the threads got killed and the _temporary folder got deleted leading to plethora of these exception messages. My guess is - this is not a memory overshoot issue, nor it could be solved by allowing hadoop engine to spawn more threads. Reducing the number of files being written at one time at one node solved my problem - either by reducing the actual number of files being written, or by increasing reducer nodes.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With