Duplicated Spark Context with IntelliJ in Worksheet

Tags:

I have the following worksheet in IntelliJ:

import org.apache.spark.sql.SQLContext
import org.apache.spark.{SparkConf, SparkContext}

/** Lazily instantiated singleton instance of SQLContext */
object SQLContextSingleton {
  @transient  private var instance: SQLContext = _
  def getInstance(sparkContext: SparkContext): SQLContext = {
    if (instance == null) {
      instance = new SQLContext(sparkContext)
    }
    instance
  }
}

val conf = new SparkConf().
  setAppName("Scala Wooksheet").
  setMaster("local[*]")
val sc = new SparkContext(conf)
val sqlContext = new SQLContext(sc)
val df = sqlContext.read.json("/Users/someuser/some.json")
df.show

This code works in the REPL, but seems to run only the first time (with some other errors). Each subsequent time, the error is:

16/04/13 11:04:57 WARN SparkContext: Another SparkContext is being constructed (or threw an exception in its constructor).  This may indicate an error, since only one SparkContext may be running in this JVM (see SPARK-2243). The other SparkContext was created at:
org.apache.spark.SparkContext.<init>(SparkContext.scala:82)

How can I find the context already in use?

Note: I hear others say to use conf.set("spark.driver.allowMultipleContexts","true") but this seems to be a solution of increasing memory usage (like uncollected garbage).

Is there a better way?

981

asked Apr 13 '16 18:04

codeaperature

1 Answers

I was having the same problem trying to get code executed with Spark in Scala Worksheet in IntelliJ IDEA (CE 2016.3.4).

The solution for the duplicate Spark context creation was to uncheck 'Run worksheet in the compiler process' checkbox in Settings -> Languages and Frameworks -> Scala -> Worksheet. I have also tested the other Worksheet settings and they had no effect on the problem of duplicate Spark context creation.

I also did not put sc.stop() in the Worksheet. But I had to set master and appName parameters in the conf for it to work.

Here is the Worksheet version of the code from SimpleApp.scala from Spark Quick Start

import org.apache.spark.{SparkConf, SparkContext}

val conf = new SparkConf()
conf.setMaster("local[*]")
conf.setAppName("Simple Application")

val sc = new SparkContext(conf)

val logFile = "/opt/spark-latest/README.md"
val logData = sc.textFile(logFile).cache()
val numAs = logData.filter(line => line.contains("a")).count()
val numBs = logData.filter(line => line.contains("b")).count()

println(s"Lines with a: $numAs, Lines with b: $numBs")

I have used the same simple.sbt from the guide for importing the dependencies to IntelliJ IDEA.

Here is a screenshot of the functioning Scala Worksheet with Spark:

UPDATE for IntelliJ CE 2017.1 (Worksheet in REPL mode)

In 2017.1 Intellij introduced REPL mode for Worksheet. I have tested the same code with 'Use REPL' option checked. For this mode to run you need to leave the 'Run worksheet in the compiler process' checkbox in Worksheet Settings I have described above checked (it is by default).

The code runs fine in Worksheet REPL mode.

Here is the Screenshot: Apache Spark running in IntelliJ Scala Worksheet REPL Mode

answered Oct 30 '22 15:10

tomaskazemekas

Related questions
                            
                                What's the magic behind ScalaFX to make OpenJDK 9+ actually work?
                            
                                Scala Performance: imperative vs functional style
                            
                                Is NoSQL is suitable for Social Networking kind of applications
                            
                                Are there any Scala template engines other than scalate?
                            
                                Scala - Currying and default arguments
                            
                                Can I develop Lego Mindstorms in Scala?
                            
                                scala classloaders confusion
                            
                                What are the implications of using def vs. val for constant values
                            
                                Is there a built in way of converting Option to a scalaz validation?
                            
                                When exactly is the head of a Stream evaluated?
                            
                                What is Scala REPL's tab completion telling me here?
                            
                                Getting IntelliJ IDEA understand SBT dependencies
                            
                                Running SBT with -deprecation
                            
                                Can I set a timeout and number of retries on a specific pipeline request?
                            
                                Context bounds with two generic parameters
                            
                                Practical difference between def f(x: Int) = x+1 and val f = (x: Int) => x+1 in Scala
                            
                                Performance of for-comprehension in scala
                            
                                Scala, getting the type parameters of a KList as an HList
                            
                                How bad are implicit definitions?
                            
                                Scala type members variance

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Duplicated Spark Context with IntelliJ in Worksheet

Tags:

intellij-idea

scala

apache-spark

apache-spark-sql

codeaperature

People also ask

1 Answers

tomaskazemekas

Recent Activity

Donate For Us