We want to set the aws
parameters that from code would be done via the SparkContext
:
sc.hadoopConfiguration.set("fs.s3a.access.key", vault.user)
sc.hadoopConfiguration.set("fs.s3a.secret.key", vault.key)
However we have a custom Spark launcher framework that requires all the custom Spark configurations to be done via --conf
parameters to the spark-submit
command line.
Is there a way to "notify" the SparkContext to set --conf
values to the hadoopConfiguration
and not to its general SparkConf
? Looking for something along the lines of
spark-submit --conf hadoop.fs.s3a.access.key $vault.user --conf hadoop.fs.s3a.access.key $vault.key
or
spark-submit --conf hadoopConfiguration.fs.s3a.access.key $vault.user --conf hadoopConfiguration.fs.s3a.access.key $vault.key
You can submit a Spark batch application by using cluster mode (default) or client mode either inside the cluster or from an external client: Cluster mode (default): Submitting Spark batch application and having the driver run on a host in your driver resource group. The spark-submit syntax is --deploy-mode cluster.
Use spark://HOST:PORT for Standalone cluster, replace the host and port of stand-alone cluster. Use local to run locally with a one worker thread. Use local[k] and specify k with the number of cores you have locally, this runs application with k worker threads.
You need to prefix Hadoop configs with spark.hadoop.
in the command line (or SparkConf
object). For example:
spark.hadoop.fs.s3a.access.key=value
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With