<p>I have a spark job written in scala. I use </p> <pre class="prettyprint"><code>spark-shell -i <file-name> </code></pre> <p>to run the job. I need to pass a command-line argument to the job. Right now, I invoke the script through a linux task, where I do </p> <pre class="prettyprint"><code>export INPUT_DATE=2015/04/27 </code></pre> <p>and use the environment variable option to access the value using:</p> <pre class="prettyprint"><code>System.getenv("INPUT_DATE") </code></pre> <p>Is there a better way to handle the command line arguments in Spark-shell?</p>

<h3>Short answer:</h3> <p><code>spark-shell -i <(echo val theDate = $INPUT_DATE ; cat <file-name>)</code></p> <h3>Long answer:</h3> <p>This solution causes the following line to be added at the beginning of the file before passed to <code>spark-submit</code>: </p> <p><code>val theDate = ...</code>,</p> <p>thereby defining a new variable. The way this is done (the <code><( ... )</code> syntax) is called process substitution. It is available in Bash. See this question for more on this, and for alternatives (e.g. <code>mkFifo</code>) for non-Bash environments.</p> <h3>Making this more systematic:</h3> <p>Put the code below in a script (e.g. <code>spark-script.sh</code>), and then you can simply use:</p> <p><code>./spark-script.sh your_file.scala first_arg second_arg third_arg</code>, and have an <code>Array[String]</code> called <code>args</code> with your arguments.</p> <p>The file <code>spark-script.sh</code>:</p> <pre class="prettyprint"><code>scala_file=$1 shift 1 arguments=$@ #set +o posix # to enable process substitution when not running on bash spark-shell --master yarn --deploy-mode client \ --queue default \ --driver-memory 2G --executor-memory 4G \ --num-executors 10 \ -i <(echo 'val args = "'$arguments'".split("\\s+")' ; cat $scala_file) </code></pre>

Passing command line arguments to Spark-shell

Tags:

apache-spark

I have a spark job written in scala. I use

spark-shell -i <file-name>

to run the job. I need to pass a command-line argument to the job. Right now, I invoke the script through a linux task, where I do

export INPUT_DATE=2015/04/27

and use the environment variable option to access the value using:

System.getenv("INPUT_DATE")

Is there a better way to handle the command line arguments in Spark-shell?

838

asked Apr 28 '15 20:04

Jeevs

2 Answers

My solution is use a customized key to define arguments instead of spark.driver.extraJavaOptions, in case someday you pass in a value that may interfere JVM's behavior.

spark-shell -i your_script.scala --conf spark.driver.args="arg1 arg2 arg3"

You can access the arguments from within your scala code like this:

val args = sc.getConf.get("spark.driver.args").split("\\s+")
args: Array[String] = Array(arg1, arg2, arg3)

144

answered Sep 17 '22 19:09

soulmachine

Short answer:

spark-shell -i <(echo val theDate = $INPUT_DATE ; cat <file-name>)

Long answer:

This solution causes the following line to be added at the beginning of the file before passed to spark-submit:

val theDate = ...,

thereby defining a new variable. The way this is done (the <( ... ) syntax) is called process substitution. It is available in Bash. See this question for more on this, and for alternatives (e.g. mkFifo) for non-Bash environments.

Making this more systematic:

Put the code below in a script (e.g. spark-script.sh), and then you can simply use:

./spark-script.sh your_file.scala first_arg second_arg third_arg, and have an Array[String] called args with your arguments.

The file spark-script.sh:

scala_file=$1

shift 1

arguments=$@

#set +o posix  # to enable process substitution when not running on bash 

spark-shell  --master yarn --deploy-mode client \
         --queue default \
        --driver-memory 2G --executor-memory 4G \
        --num-executors 10 \
        -i <(echo 'val args = "'$arguments'".split("\\s+")' ; cat $scala_file)

answered Sep 17 '22 19:09

Amir

Related questions
                            
                                How can I get from 'pyspark.sql.types.Row' all the columns/attributes name?
                            
                                how to select all columns that starts with a common label
                            
                                Standalone Manager Vs. Yarn Vs. Mesos
                            
                                The system cannot find the path specified error while running pyspark
                            
                                Spark UDF with varargs
                            
                                Trouble building a simple SparkSQL application
                            
                                Limit Kafka batches size when using Spark Streaming
                            
                                PySpark: TypeError: condition should be string or Column
                            
                                Spark Dataframes UPSERT to Postgres Table
                            
                                spark sql window function lag
                            
                                Apache Spark java.lang.ClassNotFoundException
                            
                                Spark can access Hive table from pyspark but not from spark-submit
                            
                                SparkSQL : Can I explode two different variables in the same query?
                            
                                Create DataFrame with null value for few column
                            
                                Multiple SparkSessions in single JVM
                            
                                Spark dataframe filter
                            
                                Spark Dataframe groupBy and sort results into a list
                            
                                Concatenating string by rows in pyspark
                            
                                How to do opposite of explode in PySpark?
                            
                                Spark2.2.1 incompatible Jackson version 2.8.8

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With