I'm trying to read pyspark DataFrame from Google Cloud Storage, but I keep getting an error that the service account has no storage.objects.create permissions. The account does not have WRITER permissions, but it's just reading parquet files:
spark_session.read.parquet(input_path)
18/12/25 13:12:00 INFO com.google.cloud.hadoop.repackaged.gcs.com.google.cloud.hadoop.gcsio.GoogleCloudStorageImpl: Repairing batch of 1 missing directories.
18/12/25 13:12:01 ERROR com.google.cloud.hadoop.repackaged.gcs.com.google.cloud.hadoop.gcsio.GoogleCloudStorageImpl: Failed to repair some missing directories.
com.google.cloud.hadoop.repackaged.gcs.com.google.api.client.googleapis.json.GoogleJsonResponseException: 403 Forbidden
{
"code" : 403,
"errors" : [ {
"domain" : "global",
"message" : "***.gserviceaccount.com does not have storage.objects.create access to ***.",
"reason" : "forbidden"
} ],
"message" : "***.gserviceaccount.com does not have storage.objects.create access to ***."
}
We found the issue. It's due to the implicit auto repair feature in the GCS connector. We disabled this behavior by setting fs.gs.implicit.dir.repair.enable
to false
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With