Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

s3-dist-cp fails with OutOfMemoryException when I upgrade from EMR 5.7 to EMR 5.8

I have been using s3-dist-cp to move compressed JSON files from S3 to HDFS as part of a bigger job. I started with EMR 5.4 and upgraded through most 5.x, I currently run a 32 machine cluster with EMR 5.7 with no problem.

When I attempted to upgrade to EMR 5.8 the s3-dist-cp job fails as shown below. Has anything changed between 5.7 and 5.8 that would cause this?

#
# java.lang.OutOfMemoryError: Java heap space
# -XX:OnOutOfMemoryError="kill -9 %p
kill -9 %p"
#   Executing /bin/sh -c "kill -9 11042
kill -9 11042"...
/usr/share/aws/emr/s3-dist-cp/bin/s3-dist-cp: line 55: 11042 Killed                  hadoop jar "$S3_DIST_CP_JAR" -libjars "$LIBJARS" "$@"
Traceback (most recent call last):
  ...
like image 487
gae123 Avatar asked Nov 23 '25 22:11

gae123


1 Answers

It might be too late, but yes, there was a bug on s3-dist-cp that causes on failures of s3-dist-cp jobs on emr-5.8.0 that would otherwise work on emr-5.7.0. This bug probably causes OOM on S3DistCp client because it consumes more memory when Listing of S3 objects before the MapRed job is actually submitted. it was fixed in 5.9.0.

like image 78
jc mannem Avatar answered Nov 27 '25 00:11

jc mannem



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!