Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SQLContext implicits

I am learning spark and scala. I am well versed in java, but not so much in scala. I am going through a tutorial on spark, and came across the following line of code, which has not been explained:

val sqlContext = new org.apache.spark.sql.SQLContext(sc)
import sqlContext.implicits._

(sc is the SparkContext instance)

I know the concepts behind scala implicits (atleast I think I know). Could somebody explain to me what exactly is meant by the import statement above? What implicits are bound to the sqlContext instance when it is instantiated and how? Are these implicits defined inside the SQLContext class?

EDIT The following seems to work for me as well (fresh code):

val sqlc = new SQLContext(sc)
import sqlContext.implicits._

In this code just above. what exactly is sqlContext and where is it defined?

like image 595
Ankit Khettry Avatar asked Mar 09 '16 07:03

Ankit Khettry


1 Answers

From ScalaDoc: sqlContext.implicits contains "(Scala-specific) Implicit methods available in Scala for converting common Scala objects into DataFrames. "

And is also explained in Spark programming guide:

// this is used to implicitly convert an RDD to a DataFrame.
import sqlContext.implicits._

For example in the code below .toDF() won't work unless you will import sqlContext.implicits:

val airports = sc.makeRDD(Source.fromFile(airportsPath).getLines().drop(1).toSeq, 1)
    .map(s => s.replaceAll("\"", "").split(","))
    .map(a => Airport(a(0), a(1), a(2), a(3), a(4), a(5), a(6)))
    .toDF()

What implicits are bound to the sqlContext instance when it is instantiated and how? Are these implicits defined inside the SQLContext class?

Yes they are defined in object implicits inside SqlContext class, which extends SQLImplicits.scala. It looks there are two types of implicit conversions defined there:

  1. RDD to DataFrameHolder conversion, which enables using above mentioned rdd.toDf().
  2. Various instances of Encoder which are "Used to convert a JVM object of type T to and from the internal Spark SQL representation."
like image 100
vitalii Avatar answered Oct 20 '22 13:10

vitalii