Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

403 Error while accessing s3a using Spark

ISSUE:

Able to successfully download the file using AWS CLI as well as boto 3. However, while using the S3A connector of Hadoop/Spark , receiving the below error:

py4j.protocol.Py4JJavaError: An error occurred while calling o24.parquet.
: com.amazonaws.services.s3.model.AmazonS3Exception: Status Code: 403, AWS Service: Amazon S3, AWS Request ID: BCFFD14CB2939D68, AWS Error Code: null, AWS Error Message: Forbidden, S3 Extended Request ID: MfT8J6ZPlJccgHBXX+tX1fpX47V7dWCP3Dq+W9+IBUfUhsD4Nx+DcyqsbgbKsPn8NZzjc2U

Configuration: Running this on my Local machine

  1. Spark Version 2.4.4

  2. Hadoop Version 2.7

Jars added:

  1. hadoop-aws-2.7.3.jar

  2. aws-java-sdk-1.7.4.jar

Hadoop Config:

hadoop_conf.set("fs.s3a.access.key", access_key)
hadoop_conf.set("fs.s3a.impl", "org.apache.hadoop.fs.s3a.S3AFileSystem")
hadoop_conf.set("fs.s3a.secret.key", secret_key)
hadoop_conf.set("fs.s3a.aws.credentials.provider","org.apache.hadoop.fs.s3a.TemporaryAWSCredentialsProvider")
hadoop_conf.set("fs.s3a.session.token", session_key)
hadoop_conf.set("fs.s3a.endpoint", "s3-us-west-2.amazonaws.com") # yes, I am using central eu server.
hadoop_conf.set("com.amazonaws.services.s3.enableV4", "true")

Code to Read the file:

from pyspark import SparkConf, SparkContext, SQLContext
sc = SparkContext.getOrCreate()
hadoop_conf=sc._jsc.hadoopConfiguration()
sqlContext = SQLContext(sc)
df = sqlContext.read.parquet(path)
print(df.head())
like image 592
Ajeet2040 Avatar asked Mar 02 '20 13:03

Ajeet2040


People also ask

Why am I getting an HTTP 403 Forbidden error when I try to upload files using the Amazon S3 console?

The "403 Forbidden" error can occur due to the following reasons: Permissions are missing for s3:PutObject to add an object or s3:PutObjectAcl to modify the object's ACL. You don't have permission to use an AWS Key Management Service (AWS KMS) key. There is an explicit deny statement in the bucket policy.

What is a 403 error code?

The HTTP 403 Forbidden response status code indicates that the server understands the request but refuses to authorize it. This status is similar to 401 , but for the 403 Forbidden status code re-authenticating makes no difference.

Why is my S3 bucket Access Denied?

If you're getting Access Denied errors on public read requests that are allowed, check the bucket's Amazon S3 Block Public Access settings. Review the S3 Block Public Access settings at both the account and bucket level. These settings can override permissions that allow public read access.

Why is S3 object URL Access Denied?

If you're trying to host a static website using Amazon S3, but you're getting an Access Denied error, check the following requirements: Objects in the bucket must be publicly accessible. S3 bucket policy must allow access to the s3:GetObject action. The AWS account that owns the bucket must also own the object.


1 Answers

Set AWS credentials provider to profile credentials:

hadoopConf.set("fs.s3a.aws.credentials.provider", "com.amazonaws.auth.profile.ProfileCredentialsProvider")
like image 176
alperozaydin Avatar answered Oct 10 '22 07:10

alperozaydin