Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Multiple driver-java-options in spark submit

I am using a spark-submit specified in a bash script as:

CLUSTER_OPTIONS=" \
--master yarn-cluster \
--files     file:///${CONF_DIR}/app.conf#app.conf,file:///${CONF_DIR}/log4j-executor.xml#log4j.xml \
--conf "spark.executor.extraJavaOptions=-Dlog4j.configuration=file:log4j.xml" \
--driver-java-options '-Dlog4j.configuration=file:log4j.xml -Dconfig.file=app.conf' \
--keytab ${KEYTAB} \
--principal ${PRINCIPAL} \
"

I am finding that app conf is not being picked up as I receive this error:

Error: Unrecognized option: -Dconfig.file=file:app.conf'

I have also attempted different ways to encapsulate the driver-java-options:

1)

--driver-java-options \"-Dlog4j.configuration=file:log4j.xml -Dconfig.file=app.conf\" \

Error: Unrecognized option: -Dconfig.file=file:app.conf"

2)

--driver-java-options "-Dlog4j.configuration=file:log4j.xml -Dconfig.file=file:transformation.conf" \


./start_app.sh: line 30: -Dconfig.file=file:app.conf --keytab /app/conf/keytab/principal.keytab --principal principal : No such file or directory

How can i specify multiple driver-java-options for use by my Spark app?

N.B. I am using Spark 1.5.0

like image 212
eboni Avatar asked May 23 '17 08:05

eboni


1 Answers

Just writing this because it was so odd. The way I got this to work, it was not until I made --driver-java-options the first of all arguments. I left it as is so you get the entirety.

Using pyspark Local mode

/opt/apache-spark/spark-2.3.0-bin-hadoop2.7/bin/spark-submit \
    --driver-java-options "-Xms2G -Doracle.jdbc.Trace=true -Djava.util.logging.config.file=/opt/apache-spark/spark-2.3.0-bin-hadoop2.7/conf/oraclejdbclog.properties -Dcom.sun.management.jmxremote=true -Dcom.sun.management.jmxremote.port=1098 -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Djava.net.preferIPv4Stack=true -Djava.rmi.server.hostname=192.168.2.120 -Dcom.sun.management.jmxremote.rmi.port=1095" \
    --driver-memory $_driver_memory \
    --executor-memory $_executor_memory \
    --total-executor-cores $_total_executor_cores \
    --verbose \
    --jars /opt/apache-spark/jars/log4j-1.2.17.jar main.py \
    --dbprefix $1 \
    --copyfrom $2

Hope this helps someone.

like image 135
user3008410 Avatar answered Oct 05 '22 16:10

user3008410