Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to specify which java version to use in spark-submit command?

I want to run a spark streaming application on a yarn cluster on a remote server. The default java version is 1.7 but i want to use 1.8 for my application which is also there in the server but is not the default. Is there a way to specify through spark-submit the location of java 1.8 so that i do not get major.minor error ?

like image 632
Priyanka Avatar asked Apr 26 '16 11:04

Priyanka


2 Answers

JAVA_HOME was not enough in our case, the driver was running in java 8, but I discovered later that Spark workers in YARN were launched using java 7 (hadoop nodes have both java version installed).

I had to add spark.executorEnv.JAVA_HOME=/usr/java/<version available in workers> in spark-defaults.conf. Note that you can provide it in command line with --conf.

See http://spark.apache.org/docs/latest/configuration.html#runtime-environment

like image 54
mathieu Avatar answered Oct 19 '22 21:10

mathieu


Although you can force the Driver code to run on a particular Java version (export JAVA_HOME=/path/to/jre/ && spark-submit ... ), the workers will execute the code with the default Java version from the yarn user's PATH from the worker machine.

What you can do is set each Spark instance to use a particular JAVA_HOME by editing the spark-env.sh files (documentation).

like image 4
Radu Avatar answered Oct 19 '22 22:10

Radu