ISSUE: Able to successfully download the file using AWS CLI as well as boto 3. However, while using the S3A connector of Hadoop/Spark , receiving the below error: <pre class="prettyprint"><code>py4j.protocol.Py4JJavaError: An error occurred while calling o24.parquet. : com.amazonaws.services.s3.model.AmazonS3Exception: Status Code: 403, AWS Service: Amazon S3, AWS Request ID: BCFFD14CB2939D68, AWS Error Code: null, AWS Error Message: Forbidden, S3 Extended Request ID: MfT8J6ZPlJccgHBXX+tX1fpX47V7dWCP3Dq+W9+IBUfUhsD4Nx+DcyqsbgbKsPn8NZzjc2U </code></pre> Configuration: Running this on my Local machine <ol> <li>Spark Version 2.4.4</li> <li>Hadoop Version 2.7</li> </ol> Jars added: <ol> <li>hadoop-aws-2.7.3.jar</li> <li>aws-java-sdk-1.7.4.jar</li> </ol> Hadoop Config: <pre class="prettyprint"><code>hadoop_conf.set("fs.s3a.access.key", access_key) hadoop_conf.set("fs.s3a.impl", "org.apache.hadoop.fs.s3a.S3AFileSystem") hadoop_conf.set("fs.s3a.secret.key", secret_key) hadoop_conf.set("fs.s3a.aws.credentials.provider","org.apache.hadoop.fs.s3a.TemporaryAWSCredentialsProvider") hadoop_conf.set("fs.s3a.session.token", session_key) hadoop_conf.set("fs.s3a.endpoint", "s3-us-west-2.amazonaws.com") # yes, I am using central eu server. hadoop_conf.set("com.amazonaws.services.s3.enableV4", "true") </code></pre> Code to Read the file: <pre class="prettyprint"><code>from pyspark import SparkConf, SparkContext, SQLContext sc = SparkContext.getOrCreate() hadoop_conf=sc._jsc.hadoopConfiguration() sqlContext = SQLContext(sc) df = sqlContext.read.parquet(path) print(df.head()) </code></pre>

Set AWS credentials provider to profile credentials: <pre class="prettyprint"><code>hadoopConf.set("fs.s3a.aws.credentials.provider", "com.amazonaws.auth.profile.ProfileCredentialsProvider") </code></pre>

403 Error while accessing s3a using Spark

Tags:

amazon-s3

apache-spark

hadoop

pyspark

ISSUE:

Able to successfully download the file using AWS CLI as well as boto 3. However, while using the S3A connector of Hadoop/Spark , receiving the below error:

py4j.protocol.Py4JJavaError: An error occurred while calling o24.parquet.
: com.amazonaws.services.s3.model.AmazonS3Exception: Status Code: 403, AWS Service: Amazon S3, AWS Request ID: BCFFD14CB2939D68, AWS Error Code: null, AWS Error Message: Forbidden, S3 Extended Request ID: MfT8J6ZPlJccgHBXX+tX1fpX47V7dWCP3Dq+W9+IBUfUhsD4Nx+DcyqsbgbKsPn8NZzjc2U

Configuration: Running this on my Local machine

Spark Version 2.4.4
Hadoop Version 2.7

Jars added:

hadoop-aws-2.7.3.jar
aws-java-sdk-1.7.4.jar

Hadoop Config:

hadoop_conf.set("fs.s3a.access.key", access_key)
hadoop_conf.set("fs.s3a.impl", "org.apache.hadoop.fs.s3a.S3AFileSystem")
hadoop_conf.set("fs.s3a.secret.key", secret_key)
hadoop_conf.set("fs.s3a.aws.credentials.provider","org.apache.hadoop.fs.s3a.TemporaryAWSCredentialsProvider")
hadoop_conf.set("fs.s3a.session.token", session_key)
hadoop_conf.set("fs.s3a.endpoint", "s3-us-west-2.amazonaws.com") # yes, I am using central eu server.
hadoop_conf.set("com.amazonaws.services.s3.enableV4", "true")

Code to Read the file:

from pyspark import SparkConf, SparkContext, SQLContext
sc = SparkContext.getOrCreate()
hadoop_conf=sc._jsc.hadoopConfiguration()
sqlContext = SQLContext(sc)
df = sqlContext.read.parquet(path)
print(df.head())

592

asked Mar 02 '20 13:03

Ajeet2040

1 Answers

Set AWS credentials provider to profile credentials:

hadoopConf.set("fs.s3a.aws.credentials.provider", "com.amazonaws.auth.profile.ProfileCredentialsProvider")

176

answered Oct 10 '22 07:10

alperozaydin

Related questions
                            
                                Run Spark-shell with error :SparkContext: Error initializing SparkContext
                            
                                Spark num-executors
                            
                                Spark SQL: INSERT INTO statement syntax
                            
                                Cannot create temp dir with proper permission: /mnt1/s3
                            
                                Pyspark 1.6 - Aliasing columns after pivoting with multiple aggregates
                            
                                Apache Spark read file as a stream from HDFS
                            
                                "GC overhead limit exceeded" on cache of large dataset into spark memory (via sparklyr & RStudio)
                            
                                spark 2.1.1 : Parsed JSON values do not match with class constructor
                            
                                How can I join a spark live stream with all the data collected by another stream during its entire life cycle?
                            
                                Efficient load CSV coordinate format (COO) input to local matrix spark
                            
                                Spark: Reading big MySQL table into DataFrame fails
                            
                                SparkAppHandle Listener not getting invoked
                            
                                Spark 2.3 dynamic partitionBy not working on S3 AWS EMR 5.13.0
                            
                                KryoException: Unable to find class with spark structured streaming
                            
                                Pyspark and local variables inside UDFs
                            
                                Spark watermark and windowing in Append mode
                            
                                Latent Dirichlet allocation (LDA) in Spark - replicate model
                            
                                Apache Spark Executors Dead - is this the expected behaviour?
                            
                                Spark concurrent writes on same HDFS location
                            
                                Kappa architecture: when insert to batch/analytic serving layer happens

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With