Best practice to create SparkSession object in Scala to use both in unittest and spark-submit

Tags:

I have tried to write a transform method from DataFrame to DataFrame. And I also want to test it by scalatest.

As you know, in Spark 2.x with Scala API, you can create SparkSession object as follows:

import org.apache.spark.sql.SparkSession

val spark = SparkSession.bulider
     .config("spark.master", "local[2]")
     .getOrCreate()

This code works fine with unit tests. But, when I run this code with spark-submit, the cluster options did not work. For example,

spark-submit --master yarn --deploy-mode client --num-executors 10 ...

does not create any executors.

I have found that the spark-submit arguments are applied when I remove config("master", "local[2]") part of the above code. But, without master setting the unit test code did not work.

I tried to split spark (SparkSession) object generation part to test and main. But there is so many code blocks needs spark, for example import spark.implicit,_ and spark.createDataFrame(rdd, schema).

Is there any best practice to write a code to create spark object both to test and to run spark-submit?

298

asked Jul 31 '17 04:07

Joo-Won Jung

2 Answers

One way is to create a trait which provides the SparkContext/SparkSession, and use that in your test cases, like so:

trait SparkTestContext {
  private val master = "local[*]"
  private val appName = "testing"
  System.setProperty("hadoop.home.dir", "c:\\winutils\\")
  private val conf: SparkConf = new SparkConf()
    .setMaster(master)
    .setAppName(appName)
    .set("spark.driver.allowMultipleContexts", "false")
    .set("spark.ui.enabled", "false")

  val ss: SparkSession = SparkSession.builder().config(conf).enableHiveSupport().getOrCreate()
  val sc: SparkContext = ss.sparkContext
  val sqlContext: SQLContext = ss.sqlContext
}

And your test class header then looks like this for example:

class TestWithSparkTest extends BaseSpec with SparkTestContext with Matchers{

101

answered Nov 10 '22 14:11

Rick Moritz

I made a version where Spark will close correctly after tests.

import org.apache.spark.sql.{SQLContext, SparkSession}
import org.apache.spark.{SparkConf, SparkContext}
import org.scalatest.{BeforeAndAfterAll, FunSuite, Matchers}

trait SparkTest extends FunSuite with BeforeAndAfterAll with Matchers {
  var ss: SparkSession = _
  var sc: SparkContext = _
  var sqlContext: SQLContext = _

  override def beforeAll(): Unit = {
    val master = "local[*]"
    val appName = "MyApp"
    val conf: SparkConf = new SparkConf()
      .setMaster(master)
      .setAppName(appName)
      .set("spark.driver.allowMultipleContexts", "false")
      .set("spark.ui.enabled", "false")

    ss = SparkSession.builder().config(conf).getOrCreate()

    sc = ss.sparkContext
    sqlContext = ss.sqlContext
    super.beforeAll()
  }

  override def afterAll(): Unit = {
    sc.stop()
    super.afterAll()
  }
}

answered Nov 10 '22 15:11

Karima Rafes

Related questions
                            
                                How to convert Iterator to scalaz stream?
                            
                                Scala transform String to StringOps
                            
                                Intellij worksheet and classes defined in it
                            
                                Akka TCP client: How can I send a message over TCP using akka actor
                            
                                State transformations with a shapeless State monad
                            
                                Scala+Slick 3: Inserting the result of one query into another table
                            
                                Scala: transform a collection, yielding 0..many elements on each iteration
                            
                                Spark DataFrame not respecting schema and considering everything as String
                            
                                sbt-scoverage exclude syntax
                            
                                How to implement a simple TCP protocol using Akka Streams?
                            
                                Shapeless map HList depending on target types
                            
                                org.apache.thrift.transport.TTransportException error while Reading large JSON file in zeppelin scala
                            
                                How to build a Map of lists of map from type safe config in scala
                            
                                Spark scala remove columns containing only null values
                            
                                Scala futures and `andThen` exception propagation
                            
                                map error when applying on list of tuples in scala
                            
                                Scala + Play Framework + Slick - Json as Model Field
                            
                                Functional way to take element in a list until a limit in Scala
                            
                                How do I actually run the Gatling test via SBT
                            
                                Use circe to preprocess dot-notation style fields

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Best practice to create SparkSession object in Scala to use both in unittest and spark-submit

Tags:

scala

apache-spark

spark-submit

Joo-Won Jung

People also ask

2 Answers

Rick Moritz

Karima Rafes

Recent Activity

Donate For Us