I have a query regarding creating multiple spark sessions in one JVM. I have read that creating multiple contexts is not recommended in earlier versions of Spark. Is it true with the SparkSession in Spark 2.0 as well.
I am thinking of making a call to a web service or a servlet from the UI, and the service creates a spark session, performs some operation and returns the result. This will result in a spark session being created for every request from the client side. Is this practice recommended ?
Say I have a method something like :
public void runSpark() throws Exception {
SparkSession spark = SparkSession
.builder()
.master("spark://<masterURL>")
.appName("JavaWordCount")
.getOrCreate();
and so on....
If I put this method in a web service , will there be any JVM issues ? As such I am able invoke this method multiple times from a main method.But not sure if this is good practice.
Since the question talks about SparkSessions, it's important to point out that there can be multiple SparkSession s running but only a single SparkContext per JVM.
Starting in Spark 2.0, the SparkSession encapsulates both. Spark applications can use multiple sessions to use different underlying data catalogs. You can use an existing Spark session to create a new session by calling the newSession method.
Only one SparkContext may be active per JVM. You must stop() the active SparkContext before creating a new one. The first thing a Spark program must do is to create a JavaSparkContext object, which tells Spark how to access a cluster.
Note: we can have multiple spark contexts by setting spark. driver. allowMultipleContexts to true . But having multiple spark contexts in the same jvm is not encouraged and is not considered as a good practice as it makes it more unstable and crashing of 1 spark context can affect the other.
The documentation of getOrCreate
states
This method first checks whether there is a valid thread-local SparkSession, and if yes, return that one. It then checks whether there is a valid global default SparkSession, and if yes, return that one. If no valid global default SparkSession exists, the method creates a new SparkSession and assigns the newly created SparkSession as the global default.
There is also the method SparkSession.newSession
that indicates
Start a new session with isolated SQL configurations, temporary tables, registered functions are isolated, but sharing the underlying SparkContext and cached data.
So, I guess that the answer to your question is, that you can have multiple sessions, but there is still a single SparkContext
per JVM that will be used by all your sessions.
I could imagine, that a possibly scenario for your web application could be to create one SparkSession
either per request or, e.g. HTTP session and use this to isolate Spark executions per request or user session <-- Since I'm pretty new to Spark - can someone confirm this ?
If you have an existing spark session and want to create new one, use the newSession method on the existing SparkSession.
import org.apache.spark.sql.{SQLContext, SparkSession}
val newSparkSession = spark.newSession()
The newSession method creates a new spark session with isolated SQL configurations, temporary tables.The new session will share the underlying SparkContext
and cached data.
It is not supported and won't be. SPARK-2243 is resolved as Won't Fix.
If you need multiple contexts there are different projects which can help you (Mist, Livy).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With