Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to create SQLContext in spark using scala?

I am creating a Scala program to SQLContext using sbt. This is my build.sbt:

name := "sampleScalaProject"

version := "1.0"

scalaVersion := "2.11.7"
//libraryDependencies += "org.apache.spark" %% "spark-core" % "2.5.2"
libraryDependencies += "org.apache.spark" % "spark-core_2.11" % "1.5.2"
libraryDependencies += "org.apache.kafka" % "kafka_2.11" % "0.8.2.2"
libraryDependencies += "org.apache.spark" % "spark-streaming_2.11" % "1.5.2"
libraryDependencies += "org.apache.spark" % "spark-sql_2.11" % "1.5.2"
libraryDependencies += "org.apache.hadoop" % "hadoop-common" % "2.6.0"  

And this is test program:

import org.apache.spark.SparkContext
import org.apache.spark.sql.SQLContext

object SqlContextSparkScala {

  def main (args: Array[String]) {
    val sc = SparkContext
    val sqlcontext = new SQLContext(sc)
  }
} 

I am getting below error:

Error:(8, 26) overloaded method constructor SQLContext with alternatives:
  (sparkContext: org.apache.spark.api.java.JavaSparkContext)org.apache.spark.sql.SQLContext <and>
  (sparkContext: org.apache.spark.SparkContext)org.apache.spark.sql.SQLContext
 cannot be applied to (org.apache.spark.SparkContext.type)
    val sqlcontexttest = new SQLContext(sc)  

Can anybody please let me know the issue as I am very new to scala and spark programming?

like image 685
Amaresh Avatar asked Dec 21 '15 01:12

Amaresh


People also ask

How do I get Spark in SQLContext?

You can create an SQLContext in Spark shell by passing a default SparkContext object (sc) as a parameter to the SQLContext constructor.

How do I create a Spark session in Scala?

Create SparkSession From Scala Program To create SparkSession in Scala or Python, you need to use the builder pattern method builder() and calling getOrCreate() method. If SparkSession already exists it returns otherwise creates a new SparkSession. SparkSession. builder() – Return SparkSession.

What is Spark SQLContext?

SQLContext is the entry point to SparkSQL which is a Spark module for structured data processing. Once SQLContext is initialised, the user can then use it in order to perform various “sql-like” operations over Datasets and Dataframes.

What is the difference between SparkContext and SQLContext?

sparkContext is a Scala implementation entry point and JavaSparkContext is a java wrapper of sparkContext. SQLContext is entry point of SparkSQL which can be received from sparkContext. Prior to 2. x.x, RDD ,DataFrame and Data-set were three different data abstractions.

How do I create a SQL context in spark in Scala?

scala > val sqlcontext = new org. apache. spark. sql. SQLContext ( sc) In Spark 1.0, you would need to pass a SparkContext object to a constructor in order to create SQL Context instance, In Scala, you do this as explained in the below example.

How to create an sqlcontext in spark shell?

You can create an SQLContext in Spark shell by passing a default SparkContext object (sc) as a parameter to the SQLContext constructor. scala > val sqlcontext = new org. apache. spark. sql.

Does Spark SQL have Python support?

Full python support will be added in a future release. The entry point into all functionality in Spark SQL is the SQLContext class, or one of its descendants. To create a basic SQLContext, all you need is a SparkContext.

How does Spark SQL work internally?

Internally, Spark SQL uses this extra information to perform extra optimizations. There are several ways to interact with Spark SQL including SQL, the DataFrames API and the Datasets API. When computing a result the same execution engine is used, independent of which API/language you are using to express the computation.


1 Answers

For newer versions of Spark (2.0+), use SparkSession:

val spark = SparkSession.builder.getOrCreate()

SparkSession can do everything SQLContext can do but if needed the SQLContext can be accessed as follows,

val sqlContext = spark.sqlContext
like image 108
Shaido Avatar answered Oct 07 '22 08:10

Shaido