ISSUE:
Able to successfully download the file using AWS CLI as well as boto 3. However, while using the S3A connector of Hadoop/Spark , receiving the below error:
py4j.protocol.Py4JJavaError: An error occurred while calling o24.parquet.
: com.amazonaws.services.s3.model.AmazonS3Exception: Status Code: 403, AWS Service: Amazon S3, AWS Request ID: BCFFD14CB2939D68, AWS Error Code: null, AWS Error Message: Forbidden, S3 Extended Request ID: MfT8J6ZPlJccgHBXX+tX1fpX47V7dWCP3Dq+W9+IBUfUhsD4Nx+DcyqsbgbKsPn8NZzjc2U
Configuration: Running this on my Local machine
Spark Version 2.4.4
Hadoop Version 2.7
Jars added:
hadoop-aws-2.7.3.jar
aws-java-sdk-1.7.4.jar
Hadoop Config:
hadoop_conf.set("fs.s3a.access.key", access_key)
hadoop_conf.set("fs.s3a.impl", "org.apache.hadoop.fs.s3a.S3AFileSystem")
hadoop_conf.set("fs.s3a.secret.key", secret_key)
hadoop_conf.set("fs.s3a.aws.credentials.provider","org.apache.hadoop.fs.s3a.TemporaryAWSCredentialsProvider")
hadoop_conf.set("fs.s3a.session.token", session_key)
hadoop_conf.set("fs.s3a.endpoint", "s3-us-west-2.amazonaws.com") # yes, I am using central eu server.
hadoop_conf.set("com.amazonaws.services.s3.enableV4", "true")
Code to Read the file:
from pyspark import SparkConf, SparkContext, SQLContext
sc = SparkContext.getOrCreate()
hadoop_conf=sc._jsc.hadoopConfiguration()
sqlContext = SQLContext(sc)
df = sqlContext.read.parquet(path)
print(df.head())
The "403 Forbidden" error can occur due to the following reasons: Permissions are missing for s3:PutObject to add an object or s3:PutObjectAcl to modify the object's ACL. You don't have permission to use an AWS Key Management Service (AWS KMS) key. There is an explicit deny statement in the bucket policy.
The HTTP 403 Forbidden response status code indicates that the server understands the request but refuses to authorize it. This status is similar to 401 , but for the 403 Forbidden status code re-authenticating makes no difference.
If you're getting Access Denied errors on public read requests that are allowed, check the bucket's Amazon S3 Block Public Access settings. Review the S3 Block Public Access settings at both the account and bucket level. These settings can override permissions that allow public read access.
If you're trying to host a static website using Amazon S3, but you're getting an Access Denied error, check the following requirements: Objects in the bucket must be publicly accessible. S3 bucket policy must allow access to the s3:GetObject action. The AWS account that owns the bucket must also own the object.
Set AWS credentials provider to profile credentials:
hadoopConf.set("fs.s3a.aws.credentials.provider", "com.amazonaws.auth.profile.ProfileCredentialsProvider")
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With