I have a map-reduce application running on AWS EMR that writes some output to a different (aws account) s3 bucket. I have the permission setup and the job can write to the external bucket, but the owner is still the root from the account where the Hadoop job is running. I would like to change this to the external account that owns the bucket.
I found I can set fs.s3a.acl.default to bucket-owner-full-control, however that doesn't seem like working. This is what I am doing:
conf.set("fs.s3a.acl.default", "bucket-owner-full-control");
FileSystem fileSystem = FileSystem.get(URI.create(s3Path), conf);
FSDataOutputStream fsDataOutputStream = fileSystem.create(new Path(filePath));
PrintWriter writer = new PrintWriter(fsDataOutputStream);
writer.write(contentAsString);
writer.close();
fsDataOutputStream.close();
Any help is appreciated.
conf.set("fs.s3a.acl.default", "bucket-owner-full-control");
is the right property you are setting.
As this the property in core-site.xml to give full control to bucket owner.
<property>
<name>fs.s3a.acl.default</name>
<description>Set a canned ACL for newly created and copied objects. Value may be private,
public-read, public-read-write, authenticated-read, log-delivery-write,
bucket-owner-read, or bucket-owner-full-control.</description>
</property>
BucketOwnerFullControl
Specifies that the owner of the bucket is granted Permission.FullControl. The owner of the bucket is not necessarily the same as the owner of the object.
I recommend to set fs.s3.canned.acl also to value BucketOwnerFullControl
For debugging you can use the below snippet to understand what parameters are actually passing..
for (Entry<String, String> entry: conf) {
System.out.printf("%s=%s\n", entry.getKey(), entry.getValue());
}
For testing purpose do this command with command line
aws s3 cp s3://bucket/source/dummyfile.txt s3://bucket/target/dummyfile.txt --sse --acl bucket-owner-full-control
If this works then through api also it will.
For Spark to access the s3 file system and set the proper configurations like the below example...
val hadoopConf = spark.sparkContext.hadoopConfiguration
hadoopConf.set("fs.s3a.fast.upload","true")
hadoopConf.set("mapreduce.fileoutputcommitter.algorithm.version","2")
hadoopConf.set("fs.s3a.server-side-encryption-algorithm", "AES256")
hadoopConf.set("fs.s3a.canned.acl","BucketOwnerFullControl")
hadoopConf.set("fs.s3a.acl.default","BucketOwnerFullControl")
If you are using EMR then you have to use the AWS team's S3 connector, with "s3://" URLs and use their documented configuration options. They don't support the apache one, so any option with "fs.s3a" at the beginning isn't going to have any effect whatsoever.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With