S3A: fails while S3: works in Spark EMR

Tags:

I'm using EMR 5.5.0 with Spark. If I write a simple file to s3 using an s3://... URL it writes fine. But if I use an s3a://... address, it fails with Service: Amazon S3; Status Code: 403; Error Code: AccessDenied

Using the AWS command line I'm able to cp, mv, and rm any file in the path I'm writing to. But from spark, s3a fails on the put command.

We have Server Side Encryption Enabled, and I know spark knows because the s3 URLs work. Any ideas?

Failed PUT DEBUG logs here. Maybe its important to note, I'm doing an rdd.saveAsTextFile(path) but the put command says its trying to write to /my-bucket/tmp/carlos/testWrite/4/_temporary/0/ which it should only do in parquet? Not sure if that detail is relevant but thought I would mention.

691

asked Aug 11 '17 14:08

Carlos Bribiescas

1 Answers

s3a is the actively maintained S3 client in Apache Hadoop. AWS forked their own client off from the Apache s3n:// client many years ago & (presumably) have massively reworked theirs.

They can read and write the same data, but some bits of EMR expect extra methods in the filesystem client which only EMR s3 supports...you cannot safely use s3a.

There's also the original ASF s3:// client which is incompatible with everything else, but was the first code used to connect Hadoop with S3, way before EMR was a product from amazon.

Which is better? S3A is probably, as of Aug 2017, faster on aggressive read IO of columnar formats like ORC and Parquet. EMR S3, with emrfs probably has the edge in terms of resilience and consistency. But the open source ASF S3A client is moving to address those

186

answered Oct 06 '22 18:10

stevel

Related questions
                            
                                Terraform load balancer with multiple listeners
                            
                                "docker-machine rm" failing on non-existent EC2 instance
                            
                                AWS Host Multiple Domains On One EC2 Instance
                            
                                How to use aws athena using nodejs?
                            
                                Calculate the size of all files in a bucket S3
                            
                                Parse cloudwatch logs using filter patterns
                            
                                SSL on Elastic Beanstalk
                            
                                How "Real-Time" DynamoDB stream is?
                            
                                Upload jpg to S3: "The request body terminated unexpectedly"
                            
                                How do I get the Hosted Zone for a domain using Boto 3?
                            
                                Swift 3: How to set multiple cookies for JWPlayer for HLS Streaming
                            
                                How to upload documents to AWS CloudSearch with Boto3
                            
                                Script or api to provide the ami-id of the latest amazon-ecs-optimized image
                            
                                DynamoDB TTL: when are items removed
                            
                                Can't register EC2 instance in ELB
                            
                                Cognito User Pool Groups not working with different roles
                            
                                AWS javascript SDK SES SendMail Illegal Address
                            
                                Understanding AWS route-tables - cannot create a more specific route
                            
                                DynamoDB Mapper mapping Collection Datatypes
                            
                                AWS Lambda rename the function

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

S3A: fails while S3: works in Spark EMR

Tags:

amazon-web-services

amazon-s3

apache-spark

Carlos Bribiescas

People also ask

1 Answers

stevel

Recent Activity

Donate For Us