Say your spark cluster which runs on Amazon EMR needs to access a postgresql database. What is the best way to give it it's login and password? Those are some ways we have tried:
Is there a better way I am missing?
you could use EC2 instance metadata to push up the secret to each ec2 instance, use GET calls to retrieve it. Ideally, create a login/password only valid for the lifespan of the EMR cluster, deleting it (or at least, reset the password) after it's been torn down.
AFAIK there's no explicit support for this in Spark, but if you get the AWS SDK on the classpath, you can use EC2MetadataUtils to work with it
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With