Hadoop distcp No AWS Credentials provided

Tags:

I have a huge bucket of S3files that I want to put on HDFS. Given the amount of files involved my preferred solution is to use 'distributed copy'. However for some reason I can't get hadoop distcp to take my Amazon S3 credentials. The command I use is:

hadoop distcp -update s3a://[bucket]/[folder]/[filename] hdfs:///some/path/ -D fs.s3a.awsAccessKeyId=[keyid] -D fs.s3a.awsSecretAccessKey=[secretkey] -D fs.s3a.fast.upload=true

However that acts the same as if the '-D' arguments aren't there.

ERROR tools.DistCp: Exception encountered
java.io.InterruptedIOException: doesBucketExist on [bucket]: com.amazonaws.AmazonClientException: No AWS Credentials provided by BasicAWSCredentialsProvider EnvironmentVariableCredentialsProvider SharedInstanceProfileCredentialsProvider : com.amazonaws.SdkClientException: Unable to load credentials from service endpoint

I've looked at the hadoop distcp documentation, but can't find a solution there on why this isn't working. I've tried -Dfs.s3n.awsAccessKeyId as a flag which didn't work either. I've read how explicitly passing credentials isn't good practice, so maybe this is just some gentil suggestion to do it some other way?

How is one supposed to pass S3 credentials with distcp? Anyone knows?

464

asked Nov 23 '17 13:11

KDC

1 Answers

It appears the format of credentials flags has changed since the previous version. The following command works:

hadoop distcp \
  -Dfs.s3a.access.key=[accesskey] \
  -Dfs.s3a.secret.key=[secretkey] \
  -Dfs.s3a.fast.upload=true \
  -update \
  s3a://[bucket]/[folder]/[filename] hdfs:///some/path

171

answered Sep 26 '22 07:09

KDC

Related questions
                            
                                Cloudera Impala INVALIDATE METADATA
                            
                                what Hadoop will do after one of datanodes down
                            
                                How Spark RDD partitions are processed if no. of executors < no. of RDD partition
                            
                                hadoop fs -rm -skipTrash doesn't work
                            
                                Concepts and tools required to scale up algorithms
                            
                                what does 2n + 1 quorum mean?
                            
                                Remote access to namenode is not allowed despite the services are already started.
                            
                                How to store grouped records into multiple files with Pig?
                            
                                Combine columns from multiple columns into one in Hive
                            
                                properly loading datetime in pig
                            
                                Oozie shell script action
                            
                                Adding/Defining Jars in Hive permanently
                            
                                How to get output after running Apache Spark job on web
                            
                                Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/hbase/HBaseConfiguration
                            
                                Processing JSON using java Mapreduce
                            
                                HBase oldWALs: what it is and how can I clean it?
                            
                                How to Get the file name for record in spark RDD (JavaRDD)
                            
                                'SparkContext' object has no attribute 'textfile'
                            
                                Invalidate metadata/refresh imapala from spark code
                            
                                Writing Spark dataframe as parquet to S3 without creating a _temporary folder

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Hadoop distcp No AWS Credentials provided

Tags:

amazon-s3

hadoop

distcp

s3distcp

KDC

People also ask

1 Answers

KDC

Recent Activity

Donate For Us