Below are my versions for everything
<spark.version>2.3.1</spark.version>
<scala.version>2.11.8</scala.version>
<hadoop.version>2.7.7</hadoop.version>
<artifactId>aws-java-sdk</artifactId>
<version>1.7.4</version>
And I have the following code that is submitted to spark-submit as part of a fat jar.
spark.sparkContext.hadoopConfiguration.set("fs.s3a.impl", "org.apache.hadoop.fs.s3a.S3AFileSystem")
spark.sparkContext.hadoopConfiguration.set("log4j.logger.org.apache.hadoop.fs.s3a", "DEBUG")
spark.sparkContext.hadoopConfiguration.set("fs.s3a.server-side-encryption-algorithm", "AES256")
spark.sparkContext.hadoopConfiguration.set("fs.s3a.endpoint", endpoint)
spark.sparkContext.hadoopConfiguration.set("fs.s3a.aws.credentials.provider", "org.apache.hadoop.fs.s3a.TemporaryAWSCredentialsProvider")
spark.sparkContext.hadoopConfiguration.set("fs.s3a.access.key", access)
spark.sparkContext.hadoopConfiguration.set("fs.s3a.secret.key", secret)
spark.sparkContext.hadoopConfiguration.set("fs.s3a.session.token", session)
spark.sparkContext.hadoopConfiguration.set("fs.s3a.proxy.host", proxyHost)
spark.sparkContext.hadoopConfiguration.set("fs.s3a.proxy.port", proxyPort.toString)
spark.sparkContext.hadoopConfiguration.set("fs.s3a.proxy.username", proxyUser)
spark.sparkContext.hadoopConfiguration.set("fs.s3a.proxy.password", proxyPass)
val credentials = new StaticCredentialsProvider(new BasicSessionCredentials(access, secret, session))
val config = new ClientConfiguration()
.withProxyHost(proxyHost)
.withProxyPort(proxyPort)
.withProxyUsername(proxyUser)
.withProxyPassword(proxyPass)
val s3Client = new AmazonS3Client(credentials, config)
s3Client.setEndpoint(endpoint)
val `object` = s3Client.getObject(new GetObjectRequest(bucket, key))
val objectData = `object`.getObjectContent
println("This works! :) " + objectData.toString)
val json = spark.read.textFile("s3a://" + bucket + "/" + key)
println("Error before here :( " + json)
The call using the AmazonS3Client works
This works! :) com.amazonaws.services.s3.model.S3ObjectInputStream@3f736a16
But I get the below error leveraging s3a
2018-09-12 20:45:59 INFO S3AFileSystem:1207 - Caught an AmazonServiceException com.amazonaws.services.s3.model.AmazonS3Exception: Status Code: 403, AWS Service: Amazon S3, AWS Request ID: D8A113B7B1AB31B9, AWS Error Code: null, AWS Error Message: Forbidden, S3 Extended Request ID: AybHBDYJCeWlw2brLdL0Ezpg5PNTUs9kxUqr17xR6qnv3WTxUQ0T1Vs78aM9mG8bsjTzguePZG0=
2018-09-12 20:45:59 INFO S3AFileSystem:1208 - Error Message: Status Code: 403, AWS Service: Amazon S3, AWS Request ID: D8A113B7B1AB31B9, AWS Error Code: null, AWS Error Message: Forbidden
2018-09-12 20:45:59 INFO S3AFileSystem:1209 - HTTP Status Code: 403
2018-09-12 20:45:59 INFO S3AFileSystem:1210 - AWS Error Code: null
2018-09-12 20:45:59 INFO S3AFileSystem:1211 - Error Type: Client
2018-09-12 20:45:59 INFO S3AFileSystem:1212 - Request ID: D8A113B7B1AB31B9
2018-09-12 20:45:59 INFO S3AFileSystem:1213 - Stack
com.amazonaws.services.s3.model.AmazonS3Exception: Status Code: 403, AWS Service: Amazon S3, AWS Request ID: D8A113B7B1AB31B9, AWS Error Code: null, AWS Error Message: Forbidden, S3 Extended Request ID: AybHBDYJCeWlw2brLdL0Ezpg5PNTUs9kxUqr17xR6qnv3WTxUQ0T1Vs78aM9mG8bsjTzguePZG0=
at com.amazonaws.http.AmazonHttpClient.handleErrorResponse(AmazonHttpClient.java:798)
at com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:421)
at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:232)
at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3528)
at com.amazonaws.services.s3.AmazonS3Client.getObjectMetadata(AmazonS3Client.java:976)
at com.amazonaws.services.s3.AmazonS3Client.getObjectMetadata(AmazonS3Client.java:956)
at org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:892)
at org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:77)
at org.apache.hadoop.fs.FileSystem.isDirectory(FileSystem.java:1439)
at org.apache.spark.sql.execution.streaming.FileStreamSink$.hasMetadata(FileStreamSink.scala:46)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:354)
at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:239)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:227)
at org.apache.spark.sql.DataFrameReader.text(DataFrameReader.scala:693)
at org.apache.spark.sql.DataFrameReader.textFile(DataFrameReader.scala:732)
at org.apache.spark.sql.DataFrameReader.textFile(DataFrameReader.scala:702)
at com.company.HelloWorld$.main(HelloWorld.scala:77)
at com.company.HelloWorld.main(HelloWorld.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:894)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:198)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:228)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:137)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
As far as I know they should be configured identically. So I am at a loss as to why the client works but s3a is getting a 403 error?
I managed to fix the issue by removing AWS Java SDK
<dependency>
<groupId>com.amazonaws</groupId>
<artifactId>aws-java-sdk</artifactId>
<version>1.7.4</version>
</dependency>
and replacing it with the 2.8.1 version of Hadoop AWS dependency.
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-aws</artifactId>
<version>2.8.1</version>
</dependency>
Nothing obvious. That log4j setting needs to go to log4j.properties, but as the authentication chain deliberately avoids logging anything useful, it won't help that much
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With