Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Google Cloud Storage requires storage.objects.create permission when reading from pyspark

I'm trying to read pyspark DataFrame from Google Cloud Storage, but I keep getting an error that the service account has no storage.objects.create permissions. The account does not have WRITER permissions, but it's just reading parquet files:

spark_session.read.parquet(input_path)

18/12/25 13:12:00 INFO com.google.cloud.hadoop.repackaged.gcs.com.google.cloud.hadoop.gcsio.GoogleCloudStorageImpl: Repairing batch of 1 missing directories.
18/12/25 13:12:01 ERROR com.google.cloud.hadoop.repackaged.gcs.com.google.cloud.hadoop.gcsio.GoogleCloudStorageImpl: Failed to repair some missing directories.
com.google.cloud.hadoop.repackaged.gcs.com.google.api.client.googleapis.json.GoogleJsonResponseException: 403 Forbidden
{
  "code" : 403,
  "errors" : [ {
    "domain" : "global",
    "message" : "***.gserviceaccount.com does not have storage.objects.create access to ***.",
    "reason" : "forbidden"
  } ],
  "message" : "***.gserviceaccount.com does not have storage.objects.create access to ***."
}
like image 984
Yoav Avatar asked Jan 27 '23 23:01

Yoav


1 Answers

We found the issue. It's due to the implicit auto repair feature in the GCS connector. We disabled this behavior by setting fs.gs.implicit.dir.repair.enable to false.

like image 188
Yoav Avatar answered Feb 11 '23 22:02

Yoav