Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Set hadoop configuration values on spark-submit command line

We want to set the aws parameters that from code would be done via the SparkContext:

sc.hadoopConfiguration.set("fs.s3a.access.key", vault.user)
sc.hadoopConfiguration.set("fs.s3a.secret.key", vault.key)

However we have a custom Spark launcher framework that requires all the custom Spark configurations to be done via --conf parameters to the spark-submit command line.

Is there a way to "notify" the SparkContext to set --conf values to the hadoopConfiguration and not to its general SparkConf ? Looking for something along the lines of

spark-submit --conf hadoop.fs.s3a.access.key $vault.user --conf hadoop.fs.s3a.access.key $vault.key

or

spark-submit --conf hadoopConfiguration.fs.s3a.access.key $vault.user --conf hadoopConfiguration.fs.s3a.access.key $vault.key
like image 995
WestCoastProjects Avatar asked Mar 14 '17 21:03

WestCoastProjects


People also ask

How do I submit spark application?

You can submit a Spark batch application by using cluster mode (default) or client mode either inside the cluster or from an external client: Cluster mode (default): Submitting Spark batch application and having the driver run on a host in your driver resource group. The spark-submit syntax is --deploy-mode cluster.

How do I submit a spark job locally?

Use spark://HOST:PORT for Standalone cluster, replace the host and port of stand-alone cluster. Use local to run locally with a one worker thread. Use local[k] and specify k with the number of cores you have locally, this runs application with k worker threads.


1 Answers

You need to prefix Hadoop configs with spark.hadoop. in the command line (or SparkConf object). For example:

spark.hadoop.fs.s3a.access.key=value
like image 123
vanza Avatar answered Oct 10 '22 23:10

vanza