according my last question I have to define the Multiple SparkContext for my unique JVM.
I did it in the next way (using Java):
SparkConf conf = new SparkConf();
conf.setAppName("Spark MultipleContest Test");
conf.set("spark.driver.allowMultipleContexts", "true");
conf.setMaster("local");
After that I create the next source code:
SparkContext sc = new SparkContext(conf);
SQLContext sqlContext = new org.apache.spark.sql.SQLContext(sc);
and later in the code:
JavaSparkContext ctx = new JavaSparkContext(conf);
JavaRDD<Row> testRDD = ctx.parallelize(AllList);
After the code executing I got next error message:
16/01/19 15:21:08 WARN SparkContext: Multiple running SparkContexts detected in the same JVM!
org.apache.spark.SparkException: Only one SparkContext may be running in this JVM (see SPARK-2243). To ignore this error, set spark.driver.allowMultipleContexts = true. The currently running SparkContext was created at:
org.apache.spark.SparkContext.<init>(SparkContext.scala:81)
test.MLlib.BinarryClassification.main(BinaryClassification.java:41)
at org.apache.spark.SparkContext$$anonfun$assertNoOtherContextIsRunning$1.apply(SparkContext.scala:2083)
at org.apache.spark.SparkContext$$anonfun$assertNoOtherContextIsRunning$1.apply(SparkContext.scala:2065)
at scala.Option.foreach(Option.scala:236)
at org.apache.spark.SparkContext$.assertNoOtherContextIsRunning(SparkContext.scala:2065)
at org.apache.spark.SparkContext$.setActiveContext(SparkContext.scala:2151)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:2023)
at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:61)
at test.MLlib.BinarryClassification.main(BinaryClassification.java:105)
The numbers 41
and 105
are the lines, where both objects are defined in Java code. My question is, is it possible to execute multiple SparkContext on the same JVM and how to do it, if I already use the set
-method ?
Since the question talks about SparkSessions, it's important to point out that there can be multiple SparkSession s running but only a single SparkContext per JVM.
Notice however that SparkSession encapsulates SparkContext and since by default we can have only 1 context per JVM, all the SparkSessions from above example are represented in the UI within a single one application - usually the first launched.
Main entry point for Spark functionality. A SparkContext represents the connection to a Spark cluster, and can be used to create RDDs, accumulators and broadcast variables on that cluster. Only one SparkContext may be active per JVM. You must stop() the active SparkContext before creating a new one.
Once the SparkSession is instantiated, we can configure Spark's run-time config properties. Spark 2.0. 0 onwards, it is better to use sparkSession as it provides access to all the spark Functionalities that sparkContext does. Also, it provides APIs to work on DataFrames and Datasets.
Rather than using SparkContext
you should use builder
method on SparkSession
which more roubustly instantiates the spark and SQL context and ensures that there is not context conflict.
import org.apache.spark.sql.SparkSession
val spark = SparkSession.builder().appName("demo").getOrCreate()
Are you sure you need the JavaSparkContext as a separate context? The previous question that you refer to doesn't say so. If you already have a Spark Context you can create a new JavaSparkContext from it, rather than create a separate context:
SparkConf conf = new SparkConf();
conf.setAppName("Spark MultipleContest Test");
conf.set("spark.driver.allowMultipleContexts", "true");
conf.setMaster("local");
SparkContext sc = new SparkContext(conf);
SQLContext sqlContext = new org.apache.spark.sql.SQLContext(sc);
//Create a Java Context which is the same as the scala one under the hood
JavaSparkContext.fromSparkContext(sc)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With