Amazon s3a returns 400 Bad Request with Spark

Tags:

For checkout purpose I try to set up an Amazon S3 bucket as checkpoint file.

val checkpointDir = "s3a://bucket-name/checkpoint.txt"
val sc = new SparkContext(conf)
sc.setLocalProperty("spark.default.parallelism", "30")
sc.hadoopConfiguration.set("fs.s3a.access.key", "xxxxx")
sc.hadoopConfiguration.set("fs.s3a.secret.key", "xxxxx")
sc.hadoopConfiguration.set("fs.s3a.endpoint", "bucket-name.s3-website.eu-central-1.amazonaws.com")
val ssc = new StreamingContext(sc, Seconds(10))
ssc.checkpoint(checkpointDir)

but it stops with this exception

Exception in thread "main" com.amazonaws.services.s3.model.AmazonS3Exception: Status Code: 400, AWS Service: Amazon S3, AWS Request ID: 9D8E8002H3BBDDC7, AWS Error Code: null, AWS Error Message: Bad Request, S3 Extended Request ID: Qme5E3KAr/KX0djiq9poGXPJkmr0vuXAduZujwGlvaAl+oc6vlUpq7LIh70IF3LNgoewjP+HnXA=
at com.amazonaws.http.AmazonHttpClient.handleErrorResponse(AmazonHttpClient.java:798)
at com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:421)
at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:232)
at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3528)
at com.amazonaws.services.s3.AmazonS3Client.headBucket(AmazonS3Client.java:1031)
at com.amazonaws.services.s3.AmazonS3Client.doesBucketExist(AmazonS3Client.java:994)
at org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:154)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2596)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2630)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2612)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296)
at org.apache.spark.streaming.StreamingContext.checkpoint(StreamingContext.scala:232)
at com.misterbell.shiva.StreamingApp$.main(StreamingApp.scala:89)
at com.misterbell.shiva.StreamingApp.main(StreamingApp.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:664)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:169)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:192)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:111)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

I don't understand why I got this error and I can't find any example.

939

asked Dec 10 '15 18:12

crak

2 Answers

This message correspond to something like "bad endpoint" or bad signature version support.

like seen here frankfurt is the only one that not support signature version 2. And it's the one I picked.

Of course after all my reserch can't say what is signature version, it's not obvious in the documentation. But the V2 seems to work with s3a.

The endpoint seen in the S3 interface is not the real endpoint it's just the web endpoint.

you have to use one of theses endpoint like that sc.hadoopConfiguration.set("fs.s3a.endpoint", "s3.eu-west-1.amazonaws.com")

But it's work by default with US endpoint

answered Sep 24 '22 16:09

crak

If you'd like to anyway use the region that supports Signature V4 in spark you can pass flag -Dcom.amazonaws.services.s3.enableV4 to the driver options and executor options on runtime. For example:

spark-submit --conf spark.driver.extraJavaOptions='-Dcom.amazonaws.services.s3.enableV4' \
    --conf spark.executor.extraJavaOptions='-Dcom.amazonaws.services.s3.enableV4' \
    ... (other spark options)

With this settings Spark is able to write to Frankfurt (and other V4-only regions) even with not-so-fresh AWS sdk version (com.amazonaws:aws-java-sdk:1.7.4 in my case)

answered Sep 26 '22 16:09

Mariusz

Related questions
                            
                                Malformed input-Template format error: Every Default member must be a string (Cloudformation template problems)
                            
                                Confirming AWS SNS Topic Subscription for Slack Webhook
                            
                                A(Host) Records with AWS Load Balancer
                            
                                Is there a way to find out the age of an AWS account?
                            
                                Necessary s3cmd S3 permissions for PUT/Sync
                            
                                SQS - Delivery Delay of 30 minutes
                            
                                Making PHP's mail() asynchronous
                            
                                AWS Cognito User Pools in iOS (Swift) app
                            
                                AWS S3 bucket "404 Not Found"
                            
                                Lambda function within VPC doesn't have access to public Internet [closed]
                            
                                Which is lower cost, Sagemaker or EC2?
                            
                                Where are Tomcat application log files stored in Elastic Beanstalk?
                            
                                Move files between amazon S3 to Glacier and vice versa programmatically using API
                            
                                Download a application from AWS Elastic Beanstalk
                            
                                Can we copy the files and folders recursively between aws s3 buckets using boto3 Python?
                            
                                How to connect Amazon Redshift to python
                            
                                AWS Lambda@Edge Nodejs "Environment variables are not supported."
                            
                                Django Storages - Could Not Load Amazon's S3 Bindings Errors
                            
                                How to set environment variables for Laravel 5 on AWS EC2 with MySQL
                            
                                EB Deploy to multiple environments

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Amazon s3a returns 400 Bad Request with Spark

Tags:

amazon-web-services

amazon-s3

apache-spark

hdfs

spark-streaming

crak

People also ask

2 Answers

crak

Mariusz

Recent Activity

Donate For Us