Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to use custom config file for SparkSession (without using spark-submit to submit application)?

Tags:

I have an independent python script that creates a SparkSession by invoking the following lines of code and I can see that it configures the spark session perfectly as mentioned in the spark-defaults.conf file.

spark = SparkSession.builder.appName("Tester").enableHiveSupport().getOrCreate()

If I want to pass as a parameter, another file that contains spark configuration that I want to be used instead of the spark-default.conf, how can I specify this while creating a SparkSession?

I can see that I can pass a SparkConf object but is there a way to create one automatically from a file containing all the configurations?

Do I have to manually parse the input file and set the appropriate configuration manually?

like image 371
Subramaniam Ramasubramanian Avatar asked Feb 07 '18 09:02

Subramaniam Ramasubramanian


1 Answers

If you don't use spark-submit your best here is overriding SPARK_CONF_DIR. Create separate directory for each configurations set:

$ configs tree           
.
├── conf1
│   ├── docker.properties
│   ├── fairscheduler.xml
│   ├── log4j.properties
│   ├── metrics.properties
│   ├── spark-defaults.conf
│   ├── spark-defaults.conf.template
│   └── spark-env.sh
└── conf2
    ├── docker.properties
    ├── fairscheduler.xml
    ├── log4j.properties
    ├── metrics.properties
    ├── spark-defaults.conf
    ├── spark-defaults.conf.template
    └── spark-env.sh

And set environment variable before you initialize any JVM dependent objects:

import os
from pyspark.sql import SparkSession

os.environ["SPARK_CONF_DIR"] = "/path/to/configs/conf1"
spark  = SparkSession.builder.getOrCreate()

or

import os
from pyspark.sql import SparkSession

os.environ["SPARK_CONF_DIR"] = "/path/to/configs/conf2"
spark  = SparkSession.builder.getOrCreate()

This is workaround and might not work in complex scenarios.

like image 159
Alper t. Turker Avatar answered Sep 23 '22 13:09

Alper t. Turker