Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Hdfs to s3 Distcp - Access Keys

For copying the file from HDFS to S3 bucket I used the command

hadoop distcp -Dfs.s3a.access.key=ACCESS_KEY_HERE\
-Dfs.s3a.secret.key=SECRET_KEY_HERE /path/in/hdfs s3a:/BUCKET NAME

But the access key and sectet key are visible here which are not secure . Is there any method to provide credentials from file . I dont want to edit config file ,which is one of the method I came across .

like image 310
Vishal Avatar asked Jan 06 '23 16:01

Vishal


1 Answers

I also faced the same situation, and after got temporary credentials from matadata instance. (in case you're using IAM User's credential, please notice that the temporary credentials mentioned here is IAM Role, which attach to EC2 server not human, refer http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/iam-roles-for-amazon-ec2.html)

I found only specifying the credentials in the hadoop distcp cmd will not work. You also have to specify a config fs.s3a.aws.credentials.provider. (refer http://hortonworks.github.io/hdp-aws/s3-security/index.html#using-temporary-session-credentials)

Final command will look like below

hadoop distcp -Dfs.s3a.aws.credentials.provider="org.apache.hadoop.fs.s3a.TemporaryAWSCredentialsProvider" -Dfs.s3a.access.key="{AccessKeyId}" -Dfs.s3a.secret.key="{SecretAccessKey}" -Dfs.s3a.session.token="{SessionToken}" s3a://bucket/prefix/file /path/on/hdfs
like image 176
Fan Avatar answered Jan 14 '23 12:01

Fan