Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

import implicit conversions without instance of SparkSession

My Spark-Code is cluttered with code like this

object Transformations {   
  def selectI(df:DataFrame) : DataFrame = {    
    // needed to use $ to generate ColumnName
    import df.sparkSession.implicits._

    df.select($"i")
  }
}

or alternatively

object Transformations {   
  def selectI(df:DataFrame)(implicit spark:SparkSession) : DataFrame = {    
    // needed to use $ to generate ColumnName
    import sparkSession.implicits._

    df.select($"i")
  }
}

I don't really understand why we need an instance of SparkSession just to import these implicit conversions. I would rather like to do something like :

object Transformations {  
  import org.apache.spark.sql.SQLImplicits._ // does not work

  def selectI(df:DataFrame) : DataFrame = {    
    df.select($"i")
  }
}

Is there an elegant solution for this problem? My use of the implicits is not limited to $ but also Encoders, .toDF() etc.

like image 221
Raphael Roth Avatar asked Jan 27 '23 21:01

Raphael Roth


1 Answers

I don't really understand why we need an instance of SparkSession just to import these implicit conversions. I would rather like to do something like

Because every Dataset exists in a scope of specific SparkSession and a single Spark application can have multiple active SparkSession.

Theoretically some of the SparkSession.implicits._ could exist separately from the session instance like:

import org.apache.spark.sql.implicits._   // For let's say `$` or `Encoders`
import org.apache.spark.sql.SparkSession.builder.getOrCreate.implicits._  // For toDF

but it would have a significant impact on the user code.

like image 123
user9977445 Avatar answered Feb 16 '23 03:02

user9977445