Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to pass passwords to spark on EMR

Say your spark cluster which runs on Amazon EMR needs to access a postgresql database. What is the best way to give it it's login and password? Those are some ways we have tried:

  • Have configuration file on S3 with info (not ideal since password is in plain text on S3)
  • Pass it as environment variable as a part of spark-env on EMR settings (does not work since executors do not have access to environment variables. It can be set in spark config with spark.executorEnv.[EnvironmentVariableName], but this would require again the password to be in plain text in the spark config file which would also have to be in S3).

Is there a better way I am missing?

like image 741
user2944397 Avatar asked Nov 08 '22 02:11

user2944397


1 Answers

you could use EC2 instance metadata to push up the secret to each ec2 instance, use GET calls to retrieve it. Ideally, create a login/password only valid for the lifespan of the EMR cluster, deleting it (or at least, reset the password) after it's been torn down.

AFAIK there's no explicit support for this in Spark, but if you get the AWS SDK on the classpath, you can use EC2MetadataUtils to work with it

like image 101
stevel Avatar answered Nov 15 '22 11:11

stevel