What are SparkSession Config Options

Tags:

I am trying to use SparkSession to convert JSON data of a file to RDD with Spark Notebook. I already have the JSON file.

 val spark = SparkSession
   .builder()
   .appName("jsonReaderApp")
   .config("config.key.here", configValueHere)
   .enableHiveSupport()
   .getOrCreate()
val jread = spark.read.json("search-results1.json")

I am very new to spark and do not know what to use for config.key.here and configValueHere.

712

asked Mar 26 '17 03:03

Sha2b

3 Answers

SparkSession

To get all the "various Spark parameters as key-value pairs" for a SparkSession, “The entry point to programming Spark with the Dataset and DataFrame API," run the following (this is using Spark Python API, Scala would be very similar).

import pyspark
from pyspark import SparkConf
from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()
SparkConf().getAll()

or without importing SparkConf:

spark.sparkContext.getConf().getAll()

Depending on which API you are using, see one of the following:

https://spark.apache.org/docs/latest/api/scala/org/apache/spark/sql/SparkSession.html
https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/spark_session.html
https://spark.apache.org/docs/latest/api/java/org/apache/spark/sql/SparkSession.html

You can get a deeper level list of SparkSession configuration options by running the code below. Most are the same, but there are a few extra ones. I am not sure if you can change these.

spark.sparkContext._conf.getAll()

SparkContext

To get all the "various Spark parameters as key-value pairs" for a SparkContext, the "Main entry point for Spark functionality," ... "connection to a Spark cluster," ... and "to create RDDs, accumulators and broadcast variables on that cluster,” run the following.

import pyspark
from pyspark import SparkConf, SparkContext 
spark_conf = SparkConf().setAppName("test")
spark = SparkContext(conf = spark_conf)
SparkConf().getAll()

Depending on which API you are using, see one of the following:

https://spark.apache.org/docs/latest/api/scala/org/apache/spark/SparkContext.html
https://spark.apache.org/docs/latest/api/python/reference/api/pyspark.SparkContext.html
https://spark.apache.org/docs/latest/api/java/org/apache/spark/SparkContext.html

Spark parameters

You should get a list of tuples that contain the "various Spark parameters as key-value pairs" similar to the following:

[(u'spark.eventLog.enabled', u'true'),
 (u'spark.yarn.appMasterEnv.PYSPARK_PYTHON', u'/<yourpath>/parcels/Anaconda-4.2.0/bin/python'),
 ...
 ...
 (u'spark.yarn.jars', u'local:/<yourpath>/lib/spark2/jars/*')]

Depending on which API you are using, see one of the following:

https://spark.apache.org/docs/latest/api/scala/org/apache/spark/SparkConf.html
https://spark.apache.org/docs/latest//api/python/reference/api/pyspark.SparkConf.html
https://spark.apache.org/docs/latest/api/java/org/apache/spark/SparkConf.html

For a complete list of Spark properties, see:
http://spark.apache.org/docs/latest/configuration.html#viewing-spark-properties

Setting Spark parameters

Each tuple is ("spark.some.config.option", "some-value") which you can set in your application with:

SparkSession

spark = (
    SparkSession
    .builder
    .appName("Your App Name")
    .config("spark.some.config.option1", "some-value")
    .config("spark.some.config.option2", "some-value")
    .getOrCreate())

sc = spark.sparkContext

SparkContext

spark_conf = (
    SparkConf()
    .setAppName("Your App Name")
    .set("spark.some.config.option1", "some-value")
    .set("spark.some.config.option2", "some-value"))

sc = SparkContext(conf = spark_conf)

spark-defaults

You can also set the Spark parameters in a spark-defaults.conf file:

spark.some.config.option1 some-value
spark.some.config.option2 "some-value"

then run your Spark application with spark-submit (pyspark):

spark-submit \
--properties-file path/to/your/spark-defaults.conf \
--name "Your App Name" \
--py-files path/to/your/supporting/pyspark_files.zip \
--class Main path/to/your/pyspark_main.py

answered Oct 11 '22 10:10

Clay

This is how it worked for me to add spark or hive settings in my scala:

{
    val spark = SparkSession
        .builder()
        .appName("StructStreaming")
        .master("yarn")
        .config("hive.merge.mapfiles", "false")
        .config("hive.merge.tezfiles", "false")
        .config("parquet.enable.summary-metadata", "false")
        .config("spark.sql.parquet.mergeSchema","false")
        .config("hive.merge.smallfiles.avgsize", "160000000")
        .enableHiveSupport()
        .config("hive.exec.dynamic.partition", "true")
        .config("hive.exec.dynamic.partition.mode", "nonstrict")
        .config("spark.sql.orc.impl", "native")
        .config("spark.sql.parquet.binaryAsString","true")
        .config("spark.sql.parquet.writeLegacyFormat","true")
        //.config(“spark.sql.streaming.checkpointLocation”, “hdfs://pp/apps/hive/warehouse/dev01_landing_initial_area.db”)
        .getOrCreate()
}

answered Oct 11 '22 11:10

Jeff A.

The easiest way to set some config:

spark.conf.set("spark.sql.shuffle.partitions", 500).

Where spark refers to a SparkSession, that way you can set configs at runtime. It's really useful when you want to change configs again and again to tune some spark parameters for specific queries.

answered Oct 11 '22 09:10

kar09

Related questions
                            
                                Angular2 ngFor how to count the number of looping values?
                            
                                Angular - HttpClient: Map Get method object result to array property
                            
                                Ajax.BeginForm, Calls Action, Returns JSON, How do I access JSON object in my OnSuccess JS Function?
                            
                                how to not return null when a Data member field is not set in the data contract
                            
                                How can I force Jackson to write numbers as strings when serializing my objects
                            
                                Return a JSON array from a Controller in Symfony
                            
                                How to do Bulk insert using Sequelize and node.js
                            
                                Spark Row to JSON
                            
                                VS Code Error: (this.configurationService.getValue(...) || []).filter is not a function
                            
                                How do I POST a buffer of JSON using libcurl?
                            
                                How to format a bash array as a JSON array
                            
                                JsonMappingException: could not initialize proxy - no Session
                            
                                Best way to save data in Unity game [closed]
                            
                                Cyrillic characters in PHP's json_encode
                            
                                RoR nested :include to include sub-resources in to_xml/to_json
                            
                                JSON.stringify an object with Knockout JS variables
                            
                                Rails Active Model Serializer - has_many and accessing the parent record
                            
                                Object.defineProperty on a prototype prevents JSON.stringify from serializing it
                            
                                Converting JSON into newline delimited JSON in Python
                            
                                Flatten aggregated key/value pairs from a JSONB field?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

What are SparkSession Config Options

Tags:

json

apache-spark

spark-notebook

Sha2b

People also ask

3 Answers

SparkSession

SparkContext

Spark parameters

Setting Spark parameters

SparkSession

SparkContext

spark-defaults

Clay

Jeff A.

kar09

Recent Activity

Donate For Us