Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to make the environment variables reach Dataflow workers as environment variables in python sdk

I write custom sink with python sdk. I try to store data to AWS S3. To connect S3, some credential, secret key, is necessary, but it's not good to set in code for security reason. I would like to make the environment variables reach Dataflow workers as environment variables. How can I do it?

like image 223
Tadayasu Yotsu Avatar asked Oct 27 '16 13:10

Tadayasu Yotsu


People also ask

How do I set an environment variable in python project?

To set and get environment variables in Python you can just use the os module: import os # Set environment variables os. environ['API_USER'] = 'username' os. environ['API_PASSWORD'] = 'secret' # Get environment variables USER = os.

Can Python set environment variables?

With python code, environment variables can be set and manipulated. Setting the environment variable with code makes it more secure and it does not affect the running python script.


1 Answers

Generally, for transmitting information to workers that you don't want to hard-code, you should use PipelineOptions - please see Creating Custom Options. Then, when constructing the pipeline, just extract the parameters from your PipelineOptions object and put them into your transform (e.g. into your DoFn or a sink).

However, for something as sensitive as a credential, passing sensitive information in a command-line argument might be not a great idea. I would recommend a more secure approach: put the credential into a file on GCS, and pass the name of the file as a PipelineOption. Then programmatically read the file from GCS whenever you need the credential, using GcsIO.

like image 85
jkff Avatar answered Oct 21 '22 21:10

jkff