How to set Hadoop fs.s3a.acl.default on AWS EMR?

Question

I have a map-reduce application running on AWS EMR that writes some output to a different (aws account) s3 bucket. I have the permission setup and the job can write to the external bucket, but the owner is still the root from the account where the Hadoop job is running. I would like to change this to the external account that owns the bucket.

I found I can set fs.s3a.acl.default to bucket-owner-full-control, however that doesn't seem like working. This is what I am doing:

conf.set("fs.s3a.acl.default", "bucket-owner-full-control");
FileSystem fileSystem = FileSystem.get(URI.create(s3Path), conf);
FSDataOutputStream fsDataOutputStream = fileSystem.create(new Path(filePath));
PrintWriter writer  = new PrintWriter(fsDataOutputStream);
writer.write(contentAsString);
writer.close();
fsDataOutputStream.close();

Any help is appreciated.

Ram Ghadiyaram · Accepted Answer

conf.set("fs.s3a.acl.default", "bucket-owner-full-control");

is the right property you are setting.

As this the property in core-site.xml to give full control to bucket owner.

<property>
  <name>fs.s3a.acl.default</name>
  <description>Set a canned ACL for newly created and copied objects. Value may be private,
     public-read, public-read-write, authenticated-read, log-delivery-write,
     bucket-owner-read, or bucket-owner-full-control.</description>
</property>

BucketOwnerFullControl

    Specifies that the owner of the bucket is granted Permission.FullControl. The owner of the bucket is not necessarily the same as the owner of the object.

I recommend to set fs.s3.canned.acl also to value BucketOwnerFullControl

For debugging you can use the below snippet to understand what parameters are actually passing..

for (Entry<String, String> entry: conf) {
      System.out.printf("%s=%s
", entry.getKey(), entry.getValue());
    }

For testing purpose do this command with command line

aws s3 cp s3://bucket/source/dummyfile.txt s3://bucket/target/dummyfile.txt --sse --acl bucket-owner-full-control

If this works then through api also it will.

Bonus point with Spark , useful for spark scala users:

For Spark to access the s3 file system and set the proper configurations like the below example...

val hadoopConf = spark.sparkContext.hadoopConfiguration
    hadoopConf.set("fs.s3a.fast.upload","true")
    hadoopConf.set("mapreduce.fileoutputcommitter.algorithm.version","2")
    hadoopConf.set("fs.s3a.server-side-encryption-algorithm", "AES256")
    hadoopConf.set("fs.s3a.canned.acl","BucketOwnerFullControl")
    hadoopConf.set("fs.s3a.acl.default","BucketOwnerFullControl")

stevel · Answer

If you are using EMR then you have to use the AWS team's S3 connector, with "s3://" URLs and use their documented configuration options. They don't support the apache one, so any option with "fs.s3a" at the beginning isn't going to have any effect whatsoever.

How to set Hadoop fs.s3a.acl.default on AWS EMR?

Tags:

amazon-s3

scala

apache-spark

hadoop

amazon-emr

J.H

2 Answers

Bonus point with Spark , useful for spark scala users:

Ram Ghadiyaram

stevel

Recent Activity

Donate For Us

How to set Hadoop fs.s3a.acl.default on AWS EMR?

Tags:

amazon-s3

scala

apache-spark

hadoop

amazon-emr

J.H

2 Answers

Bonus point with Spark , useful for spark scala users:

Ram Ghadiyaram

stevel

Related questions

Recent Activity

Donate For Us