What is the right way to pass credentials to Dataflow jobs?
Some of my Dataflow jobs need credentials to make REST calls and fetch/post processed data.
I am currently using environment variables to pass the credentials to the JVM, read them into a Serializable object and pass them on to the DoFn implementation's constructor. I am not sure this is the right approach as any class which is Serializable should not contain sensitive information.
Another way I thought of is to store the credential in GCS and retrieve them using service account key file, but was wondering why should my job execute this task of reading credentials from GCS.
Google Cloud Dataflow does not have native support for passing or storing secured secrets. However you can use Cloud KMS and/or GCS as you propose to read a secret at runtime using your Dataflow service account credentials.
If you read the credential at runtime from a DoFn
, you can use the DoFn.Setup
lifecycle API to read the value once and cache it for the lifetime of the DoFn
.
You can learn about various options for secret management in Google Cloud here: Secret management with Cloud KMS.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With