I have a Spark application which using Spark 2.0 new API with SparkSession
. I am building this application on top of the another application which is using SparkContext
. I would like to pass SparkContext
to my application and initialize SparkSession
using existing SparkContext
.
However I could not find a way how to do that. I found that SparkSession
constructor with SparkContext
is private so I can't initialize it in that way and builder does not offer any setSparkContext
method. Do you think there exist some workaround?
In Spark or PySpark SparkSession object is created programmatically using SparkSession. builder() and if you are using Spark shell SparkSession object “ spark ” is created by default for you as an implicit object whereas SparkContext is retrieved from the Spark session object by using sparkSession. sparkContext .
By using getAll() method of SparkConf you can get all current active Spark/PySpark SparkContext settings, you can also use get() method to get value for specific settings.
SparkSession from DataFrame If you have a DataFrame, you can use it to access the SparkSession, but it's best to just grab the SparkSession with getActiveSession() . Let's shut down the active SparkSession to demonstrate the getActiveSession() returns None when no session exists.
Deriving the SparkSession
object out of SparkContext
or even SparkConf
is easy. Just that you might find the API to be slightly convoluted. Here's an example (I'm using Spark 2.4
but this should work in the older 2.x
releases as well):
// If you already have SparkContext stored in `sc` val spark = SparkSession.builder.config(sc.getConf).getOrCreate() // Another example which builds a SparkConf, SparkContext and SparkSession val conf = new SparkConf().setAppName("spark-test").setMaster("local[2]") val sc = new SparkContext(conf) val spark = SparkSession.builder.config(sc.getConf).getOrCreate()
Hope that helps!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With