We want to set the <code>aws</code> parameters that from code would be done via the <code>SparkContext</code>: <pre class="prettyprint"><code>sc.hadoopConfiguration.set("fs.s3a.access.key", vault.user) sc.hadoopConfiguration.set("fs.s3a.secret.key", vault.key) </code></pre> However we have a custom Spark launcher framework that requires all the custom Spark configurations to be done via <code>--conf</code> parameters to the <code>spark-submit</code> command line. Is there a way to "notify" the SparkContext to set <code>--conf</code> values to the <code>hadoopConfiguration</code> and not to its general <code>SparkConf</code> ? Looking for something along the lines of <pre class="prettyprint"><code>spark-submit --conf hadoop.fs.s3a.access.key $vault.user --conf hadoop.fs.s3a.access.key $vault.key </code></pre> or <pre class="prettyprint"><code>spark-submit --conf hadoopConfiguration.fs.s3a.access.key $vault.user --conf hadoopConfiguration.fs.s3a.access.key $vault.key </code></pre>

You need to prefix Hadoop configs with <code>spark.hadoop.</code> in the command line (or <code>SparkConf</code> object). For example: <pre class="prettyprint"><code>spark.hadoop.fs.s3a.access.key=value </code></pre>

Set hadoop configuration values on spark-submit command line

Tags:

apache-spark

spark-submit

We want to set the aws parameters that from code would be done via the SparkContext:

sc.hadoopConfiguration.set("fs.s3a.access.key", vault.user)
sc.hadoopConfiguration.set("fs.s3a.secret.key", vault.key)

However we have a custom Spark launcher framework that requires all the custom Spark configurations to be done via --conf parameters to the spark-submit command line.

Is there a way to "notify" the SparkContext to set --conf values to the hadoopConfiguration and not to its general SparkConf ? Looking for something along the lines of

spark-submit --conf hadoop.fs.s3a.access.key $vault.user --conf hadoop.fs.s3a.access.key $vault.key

spark-submit --conf hadoopConfiguration.fs.s3a.access.key $vault.user --conf hadoopConfiguration.fs.s3a.access.key $vault.key

995

asked Mar 14 '17 21:03

WestCoastProjects

1 Answers

You need to prefix Hadoop configs with spark.hadoop. in the command line (or SparkConf object). For example:

spark.hadoop.fs.s3a.access.key=value

123

answered Oct 10 '22 23:10

vanza

Related questions
                            
                                Spark 1.6-Failed to locate the winutils binary in the hadoop binary path
                            
                                Spark - Random Number Generation
                            
                                Could not bind on a random free port error while trying to connect to spark master
                            
                                EntityTooLarge error when uploading a 5G file to Amazon S3
                            
                                How to get ID of a map task in Spark?
                            
                                pyspark matrix with dummy variables
                            
                                Spark column string replace when present in other column (row)
                            
                                Converting a Spark Dataframe to a Scala Map collection
                            
                                How to change the column type from String to Date in DataFrames?
                            
                                Remove rows from dataframe based on condition in pyspark
                            
                                Matrix Transpose on RowMatrix in Spark
                            
                                PySpark computing correlation
                            
                                How to update column based on a condition (a value in a group)?
                            
                                AuthorizationException: User not allowed to impersonate User
                            
                                How to CROSS JOIN 2 dataframe?
                            
                                Installing Apache Spark on Ubuntu 14.04
                            
                                Partition data for efficient joining for Spark dataframe/dataset
                            
                                Spark Option: inferSchema vs header = true
                            
                                Spark: Merge 2 dataframes by adding row index/number on both dataframes
                            
                                How to max value and keep all columns (for max records per group)? [duplicate]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With