I am trying to run the spark job on the google dataproc cluster as
gcloud dataproc jobs submit hadoop --cluster <cluster-name> \
--jar file:///usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar \
--class org.apache.hadoop.examples.WordCount \
--arg1 \
--arg2 \
But the Job throws error
(gcloud.dataproc.jobs.submit.spark) PERMISSION_DENIED: Request had insufficient authentication scopes.
How do I add the auth scopes to run the JOB?
At a minimum, service accounts used with Cloud Dataproc need permissions to read and write to Google Cloud Storage, and to write to Google Cloud Logging.
Click the cluster name from the Dataproc Clusters page in the Google Cloud console, then click STOP to stop and START to start the cluster.
Usually if you're running into this error it's because of running gcloud from inside a GCE VM that's using VM-metadata controlled scopes, since otherwise gcloud installed on a local machine will typically already be using broad scopes to include all GCP operations.
For Dataproc access, when creating the VM from which you're running gcloud, you need to specify --scopes cloud-platform
from the CLI, or if creating the VM from the Cloud Console UI, you should select "Allow full access to all Cloud APIs":
As another commenter mentioned above, nowadays you can also update scopes on existing GCE instances to add the CLOUD_PLATFORM scope.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With