Permissions Issue with Google Cloud Data Fusion

Question

I'm following the instructions in the Cloud Data Fusion sample tutorial and everything seems to work fine, until I try to run the pipeline right at the end. Cloud Data Fusion Service API permissions are set for the Google managed Service account as per the instructions. The pipeline preview function works without any issues.

However, when I deploy and run the pipeline it fails after a couple of minutes. Shortly after the status changes from provisioning to running the pipeline stops with the following permissions error:

   com.google.api.client.googleapis.json.GoogleJsonResponseException: 403 Forbidden
    {
      "code" : 403,
      "errors" : [ {
        "domain" : "global",
        "message" : "xxxxxxxxxxx-compute@developer.gserviceaccount.com does not have storage.buckets.create access to project X.",
        "reason" : "forbidden"
      } ],
      "message" : "xxxxxxxxxxx-compute@developer.gserviceaccount.com does not have storage.buckets.create access to project X."
    }

xxxxxxxxxxx-compute@developer.gserviceaccount.com is the default Compute Engine service account for my project.

"Project X" is not one of mine though, I've no idea why the pipeline startup code is trying to create a bucket there, it does successfully create temporary buckets ( one called df-xxx and one called dataproc-xxx) in my project before it fails.

I've tried this with two separate accounts and get the same error in both places. I had tried adding storage/admin roles to the various service accounts to no avail but that was before I realized it was attempting to access a different project entirely.

Derek · Accepted Answer

I believe I was able to reproduce this. What's happening is that the BigQuery Source plugin first creates a temporary working GCS bucket to export the data to, and I suspect it is attempting to create it in the Dataset Project ID by default, instead of your own project as it should.

As a workaround, create a GCS bucket in your account, and then in the BigQuery Source configuration of your pipeline, set the "Temporary Bucket Name" configuration to "gs://<your-bucket-name>"

Permissions Issue with Google Cloud Data Fusion

Tags:

google-cloud-platform

google-cloud-data-fusion

cdap

Helvick

1 Answers

Derek

Recent Activity

Donate For Us

Permissions Issue with Google Cloud Data Fusion

Tags:

google-cloud-platform

google-cloud-data-fusion

cdap

Helvick

1 Answers

Derek

Related questions

Recent Activity

Donate For Us