I am copying HDFS snapshot to S3 bucket, getting below error: The command i am executing is: hadoop distcp /.snapshot/$SNAPSHOTNAME s3a://$ACCESSKEY:$SECRETKEY@$BUCKET/$SNAPSHOTNAME
15/08/20 06:50:07 INFO mapreduce.Job: map 38% reduce 0%
15/08/20 06:50:08 INFO mapreduce.Job: map 39% reduce 0%
15/08/20 06:52:15 INFO mapreduce.Job: map 41% reduce 0%
15/08/20 06:52:37 INFO mapreduce.Job: Task Id : attempt_1439998402428_0006_m_000004_0, Status : FAILED
Error: java.io.IOException: File copy failed: hdfs://mycluster/.snapshot/HDFS-SNAPSHOT-PROD.08-20-2015-06-06/tmp/hive/vladmetodiev/6da8eee9-f482-4d07-96dc-87ff77a4efe4/hive_2015-07-23_17-12-21_989_8312247652079703611-121/-ext-10001/000035_0 --> s3n://AKIAJPPHQ6RXAPWCFMAA:RVZ9Q1+ezHkUVPEbasg4BUIGAS59C27bhJiNNlgD@ul-pdc-eu/HDFS-SNAPSHOT-PROD.08-20-2015-06-06/tmp/hive/vladmetodiev/6da8eee9-f482-4d07-96dc-87ff77a4efe4/hive_2015-07-23_17-12-21_989_8312247652079703611-121/-ext-10001/000035_0
at org.apache.hadoop.tools.mapred.CopyMapper.copyFileWithRetry(CopyMapper.java:284)
at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:252)
at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:50)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:784)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: java.io.IOException: Couldn't run retriable-command: Copying hdfs://mycluster/.snapshot/HDFS-SNAPSHOT-PROD.08-20-2015-06-06/tmp/hive/vladmetodiev/6da8eee9-f482-4d07-96dc-87ff77a4efe4/hive_2015-07-23_17-12-21_989_8312247652079703611-121/-ext-10001/000035_0 to s3n://AKIAJPPHQ6RXAPWCFMAA:RVZ9Q1+ezHkUVPEbasg4BUIGAS59C27bhJiNNlgD@ul-pdc-eu/HDFS-SNAPSHOT-PROD.08-20-2015-06-06/tmp/hive/vladmetodiev/6da8eee9-f482-4d07-96dc-87ff77a4efe4/hive_2015-07-23_17-12-21_989_8312247652079703611-121/-ext-10001/000035_0
at org.apache.hadoop.tools.util.RetriableCommand.execute(RetriableCommand.java:101)
at org.apache.hadoop.tools.mapred.CopyMapper.copyFileWithRetry(CopyMapper.java:280)
... 10 more
Caused by: java.io.IOException: No space left on device
at java.io.FileOutputStream.writeBytes(Native Method)
at java.io.FileOutputStream.write(FileOutputStream.java:345)
at java.security.DigestOutputStream.write(DigestOutputStream.java:148)
at java.io.BufferedOutputStream.write(BufferedOutputStream.java:122)
at org.apache.hadoop.fs.s3native.NativeS3FileSystem$NativeS3FsOutputStream.write(NativeS3FileSystem.java:293)
at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:58)
at java.io.DataOutputStream.write(DataOutputStream.java:107)
at java.io.BufferedOutputStream.write(BufferedOutputStream.java:122)
at org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.copyBytes(RetriableFileCopyCommand.java:255)
at org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.copyToFile(RetriableFileCopyCommand.java:184)
at org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.doCopy(RetriableFileCopyCommand.java:124)
at org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.doExecute(RetriableFileCopyCommand.java:100)
at org.apache.hadoop.tools.util.RetriableCommand.execute(RetriableCommand.java:87)
... 11 more
15/08/20 06:53:13 INFO mapreduce.Job: Task Id : attempt_1439998402428_0006_m_000007_0, Status : FAILED
Error: java.io.IOException: File copy failed: hdfs://mycluster/.snapshot/HDFS-SNAPSHOT-PROD.08-20-2015-06-06/apps/hbase/data/data/default/XXHBCL01/985fbc7692868e3315ada852bcb59e1d/tr/77c160e32bfc4175a65d6a56feaeeb6c --> s3n://AKIAJPPHQ6RXAPWCFMAA:RVZ9Q1+ezHkUVPEbasg4BUIGAS59C27bhJiNNlgD@ul-pdc-eu/HDFS-SNAPSHOT-PROD.08-20-2015-06-06/apps/hbase/data/data/default/XXHBCL01/985fbc7692868e3315ada852bcb59e1d/tr/77c160e32bfc4175a65d6a56feaeeb6c
at org.apache.hadoop.tools.mapred.CopyMapper.copyFileWithRetry(CopyMapper.java:284)
at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:252)
at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:50)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:784)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: java.io.IOException: Couldn't run retriable-command: Copying hdfs://mycluster/.snapshot/HDFS-SNAPSHOT-PROD.08-20-2015-06-06/apps/hbase/data/data/default/XXHBCL01/985fbc7692868e3315ada852bcb59e1d/tr/77c160e32bfc4175a65d6a56feaeeb6c to s3n://AKIAJPPHQ6RXAPWCFMAA:RVZ9Q1+ezHkUVPEbasg4BUIGAS59C27bhJiNNlgD@ul-pdc-eu/HDFS-SNAPSHOT-PROD.08-20-2015-06-06/apps/hbase/data/data/default/XXHBCL01/985fbc7692868e3315ada852bcb59e1d/tr/77c160e32bfc4175a65d6a56feaeeb6c
at org.apache.hadoop.tools.util.RetriableCommand.execute(RetriableCommand.java:101)
at org.apache.hadoop.tools.mapred.CopyMapper.copyFileWithRetry(CopyMapper.java:280)
... 10 more
Caused by: java.io.IOException: No space left on device
at java.io.FileOutputStream.writeBytes(Native Method)
at java.io.FileOutputStream.write(FileOutputStream.java:345)
at java.security.DigestOutputStream.write(DigestOutputStream.java:148)
at java.io.BufferedOutputStream.write(BufferedOutputStream.java:122)
at org.apache.hadoop.fs.s3native.NativeS3FileSystem$NativeS3FsOutputStream.write(NativeS3FileSystem.java:293)
at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:58)
at java.io.DataOutputStream.write(DataOutputStream.java:107)
at java.io.BufferedOutputStream.write(BufferedOutputStream.java:122)
at org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.copyBytes(RetriableFileCopyCommand.java:255)
at org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.copyToFile(RetriableFileCopyCommand.java:184)
at org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.doCopy(RetriableFileCopyCommand.java:124)
at org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.doExecute(RetriableFileCopyCommand.java:100)
at org.apache.hadoop.tools.util.RetriableCommand.execute(RetriableCommand.java:87)
... 11 more
However there is enough space on device aroubnt 4 TB, Please help.
So I ran into this same problem and here is the what ultimately worked for me:
hadoop distcp -D mapreduce.job.maxtaskfailures.per.tracker=1 ...
I tried a few things (with the help of a colleagues) but the main thing that worked for me was - Changed max task failures per tracker to 1. This is mostly the key. Basically individual nodes were running out of space. So by doing this I am forcing the job not to retry on a node once it has failed on it already.
Other things that I tried but didn't work 1. Increase number of mappers. (-m ) 2. Increased the number of retries from 3 to 12. (-D yarn.app.mapreduce.client.max-retries=12)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With