Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Multiple SparkSessions in single JVM

Tags:

apache-spark

I have a query regarding creating multiple spark sessions in one JVM. I have read that creating multiple contexts is not recommended in earlier versions of Spark. Is it true with the SparkSession in Spark 2.0 as well.

I am thinking of making a call to a web service or a servlet from the UI, and the service creates a spark session, performs some operation and returns the result. This will result in a spark session being created for every request from the client side. Is this practice recommended ?

Say I have a method something like :

public void runSpark() throws Exception {

        SparkSession spark = SparkSession
          .builder()
          .master("spark://<masterURL>")
          .appName("JavaWordCount")
          .getOrCreate();

and so on....

If I put this method in a web service , will there be any JVM issues ? As such I am able invoke this method multiple times from a main method.But not sure if this is good practice.

like image 556
Rishi S Avatar asked Oct 20 '16 11:10

Rishi S


People also ask

Is it possible to have multiple SparkContext in single JVM?

Since the question talks about SparkSessions, it's important to point out that there can be multiple SparkSession s running but only a single SparkContext per JVM.

Can we create multiple SparkSession?

Starting in Spark 2.0, the SparkSession encapsulates both. Spark applications can use multiple sessions to use different underlying data catalogs. You can use an existing Spark session to create a new session by calling the newSession method.

How many SparkContext can be created?

Only one SparkContext may be active per JVM. You must stop() the active SparkContext before creating a new one. The first thing a Spark program must do is to create a JavaSparkContext object, which tells Spark how to access a cluster.

Can we create multiple Spark context?

Note: we can have multiple spark contexts by setting spark. driver. allowMultipleContexts to true . But having multiple spark contexts in the same jvm is not encouraged and is not considered as a good practice as it makes it more unstable and crashing of 1 spark context can affect the other.


3 Answers

The documentation of getOrCreate states

This method first checks whether there is a valid thread-local SparkSession, and if yes, return that one. It then checks whether there is a valid global default SparkSession, and if yes, return that one. If no valid global default SparkSession exists, the method creates a new SparkSession and assigns the newly created SparkSession as the global default.

There is also the method SparkSession.newSession that indicates

Start a new session with isolated SQL configurations, temporary tables, registered functions are isolated, but sharing the underlying SparkContext and cached data.

So, I guess that the answer to your question is, that you can have multiple sessions, but there is still a single SparkContext per JVM that will be used by all your sessions.

I could imagine, that a possibly scenario for your web application could be to create one SparkSession either per request or, e.g. HTTP session and use this to isolate Spark executions per request or user session <-- Since I'm pretty new to Spark - can someone confirm this ?

like image 146
Peter Rietzler Avatar answered Oct 21 '22 08:10

Peter Rietzler


If you have an existing spark session and want to create new one, use the newSession method on the existing SparkSession.

import org.apache.spark.sql.{SQLContext, SparkSession}
val newSparkSession = spark.newSession()

The newSession method creates a new spark session with isolated SQL configurations, temporary tables.The new session will share the underlying SparkContext and cached data.

like image 30
moriarty007 Avatar answered Oct 21 '22 08:10

moriarty007


It is not supported and won't be. SPARK-2243 is resolved as Won't Fix.

If you need multiple contexts there are different projects which can help you (Mist, Livy).

like image 29
user6022341 Avatar answered Oct 21 '22 06:10

user6022341