Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

spark.sql vs SqlContext

I have used SQL in Spark, in this example:

results = spark.sql("select * from ventas")

where ventas is a dataframe, previosuly cataloged like a table:

df.createOrReplaceTempView('ventas')

but I have seen other ways of working with SQL in Spark, using the class SqlContext:

df = sqlContext.sql("SELECT * FROM table")

What is the difference between both of them?

Thanks in advance

like image 504
juanvg1972 Avatar asked Aug 12 '18 22:08

juanvg1972


People also ask

What is the use of sqlcontext in spark?

Spark SQLContext is defined in org.apache.spark.sql package since 1.0 and is deprecated in 2.0 and replaced with SparkSession. SQLContext contains several useful functions of Spark SQL to work with structured data (columns & rows) and it is an entry point to Spark SQL.

What is the difference between Spark session vs spark context vs sqlcontext?

The difference between Spark Session vs Spark Context vs Sql Context lies in the version of the Spark versions used in Application. As per Spark versions > Spark 2.0 , A pictorial Representation of the Hierarchy between – SparkSession SparkContext SQLContext HiveContext Before Spark 2.x , SparkContext was the entry point of any Spark Application

How do I get SQL context in spark shell?

SQLContext and HiveContext Beginning in Spark 2.0, all Spark functionality, including Spark SQL, can be accessed through the SparkSessions class, available as spark when you launch spark-shell. You can create a DataFrame from an RDD, a Hive table, or a data source. Cloudera Docs SQLContext and HiveContext

What is sparkcontext in spark?

The SparkContext is used by the Driver Process of the Spark Application in order to establish a communication with the cluster and the resource managers in order to coordinate and execute jobs. SparkContext also enables the access to the other two contexts, namely SQLContext and HiveContext (more on these entry points later on).


1 Answers

From a user's perspective (not a contributor), I can only rehash what the developer's provided in the upgrade notes:

Upgrading From Spark SQL 1.6 to 2.0

  • SparkSession is now the new entry point of Spark that replaces the old SQLContext and HiveContext. Note that the old SQLContext and HiveContext are kept for backward compatibility. A new catalog interface is accessible from SparkSession - existing API on databases and tables access such as listTables, createExternalTable, dropTempView, cacheTable are moved here.

Before 2.0, the SqlContext needed an extra call to the factory that creates it. With SparkSession, they made things a lot more convenient.

If you take a look at the source code, you'll notice that the SqlContext class is mostly marked @deprecated. Closer inspection shows that the most commonly used methods simply call sparkSession.

For more info, take a look at the developer notes, Jira issues, conference talks on spark 2.0, and Databricks blog.

like image 181
emran Avatar answered Oct 20 '22 05:10

emran