Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

GoogleCloud DataFlow Failed to write a file to temp location

I am building a beam pipeline on Google cloud dataflow.

I am getting an error that cloud dataflow does not have permissions to write to a temp directory.

enter image description here

This is confusing since clearly dataflow has the ability to write to the bucket, it created a staging folder.

enter image description here

Why would I be able to write a staging folder, but not a temp folder?

I am running from within a docker container on a compute engine. I am fully authenticated with my service account.

PROJECT=$(gcloud config list project --format "value(core.project)")
BUCKET=gs://$PROJECT-testing

python tests/prediction/run.py \
    --runner DataflowRunner \
    --project $PROJECT \
    --staging_location $BUCKET/staging \
    --temp_location $BUCKET/temp \
    --job_name $PROJECT-deepmeerkat \
    --setup_file tests/prediction/setup.py

EDIT

In response to @alex amato

  1. Does the bucket belong to the project or is it owned by another project? Yes, when I go the home screen for the project, this is one of four buckets listed. I commonly upload data and interact with other google cloud services (cloud vision API) from this bucket.

  2. Would you please provide the full error message.

    "(8d8bc4d7fc4a50bd): Failed to write a file to temp location 'gs://api-project-773889352370-testing/temp/api-project-773889352370-deepmeerkat.1498771638.913123'. Please make sure that the bucket for this directory exists, and that the project under which the workflow is running has the necessary permissions to write to it."

    "8d8bc4d7fc4a5f8f): Workflow failed. Causes: (8d8bc4d7fc4a526c): One or more access checks for temp location or staged files failed. Please refer to other error messages for details. For more information on security and permissions, please see https://cloud.google.com/dataflow/security-and-permissions."

  3. Can you confirm that there isn't already an existing GCS object which matches the name of the GCS folder path you are trying to use?

Yes, there is no folder named temp in the bucket.

  1. Could you please verify the permissions you have match the members you run as

Bucket permissions have global admin

enter image description here

which matches my gcloud auth

enter image description here

like image 905
bw4sz Avatar asked Oct 17 '22 10:10

bw4sz


1 Answers

@chamikara was correct. Despite inheriting credentials from my service account, cloud dataflow needs its own credentials.

Can you also give access to cloudservices account (<project-number>@developer.gserviceaccount.com) as mentioned in cloud.google.com/dataflow/security-and-permissions.

like image 60
bw4sz Avatar answered Oct 20 '22 23:10

bw4sz