Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to pass parameters / properties to Spark jobs with spark-submit

I am running a Spark job implemented in Java using spark-submit. I would like to pass parameters to this job - e.g. a time-start and time-end parameter to parametrize the Spark application.

What I tried was using the

--conf key=value

option of the spark-submit script, but when I try to read the parameter in my Spark job with

sparkContext.getConf().get("key")

I get an exception:

Exception in thread "main" java.util.NoSuchElementException: key

Furthermore, when I use sparkContext.getConf().toDebugString() I don't see my value in the output.

Further Notice Since I want to submit my Spark Job via the Spark REST Service I cannot use an OS Environment Variable or the like.

Is there any possibility to implement this?

like image 984
Michael Lihs Avatar asked Nov 10 '16 19:11

Michael Lihs


3 Answers

Spark configuration will use only keys in the spark namespace. If you don't won't to use independent configuration tool you can try:

--conf spark.mynamespace.key=value
like image 66
user6022341 Avatar answered Sep 21 '22 19:09

user6022341


Since you want to use your custom properties you need to place your properties after application.jar in spark-submit (like in spark example [application-arguments] should be your properties. --conf should be spark configuration properties.

--conf: Arbitrary Spark configuration property in key=value format. For values that contain spaces wrap “key=value” in quotes (as shown).

./bin/spark-submit \
  --class <main-class> \
  --master <master-url> \
  --deploy-mode <deploy-mode> \
  --conf <key>=<value> \
  ... # options
  <application-jar> \
  [application-arguments] <--- here our app arguments

so when you do: spark-submit .... app.jar key=value in main method you will get args[0] as key=value.

public static void main(String[] args) {
    String firstArg = args[0]; //eq. to key=value
}

but you want to use key value pairs you need to parse somehow your app arguments.

You can check Apache Commons CLI library or some alternative.

like image 29
VladoDemcak Avatar answered Sep 24 '22 19:09

VladoDemcak


You can pass parameters like this:

./bin/spark-submit \
  --class $classname \
  --master XXX \
  --deploy-mode XXX \
  --conf XXX \
  $application-jar --**key1** $**value** --**key2** $**value2**\

Make sure to replace key1, key2 and value with proper values.

like image 45
renzherl Avatar answered Sep 23 '22 19:09

renzherl