Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How does createOrReplaceTempView work in Spark?

I am new to Spark and Spark SQL.

How does createOrReplaceTempView work in Spark?

If we register an RDD of objects as a table will spark keep all the data in memory?

like image 669
Abir Chokraborty Avatar asked May 16 '17 21:05

Abir Chokraborty


People also ask

What is the use of createOrReplaceTempView in spark?

createOrReplaceTempView. Creates or replaces a local temporary view with this DataFrame . The lifetime of this temporary table is tied to the SparkSession that was used to create this DataFrame .

What is the difference between createOrReplaceTempView and Createglobaltempview?

createOrReplaceTempView has been introduced in Spark 2.0 to replace registerTempTable. CreateTempView creates an in-memory reference to the Dataframe in use. The lifetime for this depends on the spark session in which the Dataframe was created in.

What is the difference between registerTempTable and createOrReplaceTempView?

No difference at all between createOrReplaceTempView and registerTempTable both performs the same functionality and if you open the below link and search for registerTempTable you can see that this function is deprecated in 2.0. There is a note like below: Deprecated in 2.0 use createOrReplaceTempView instead.


2 Answers

createOrReplaceTempView creates (or replaces if that view name already exists) a lazily evaluated "view" that you can then use like a hive table in Spark SQL. It does not persist to memory unless you cache the dataset that underpins the view.

scala> val s = Seq(1,2,3).toDF("num") s: org.apache.spark.sql.DataFrame = [num: int]  scala> s.createOrReplaceTempView("nums")  scala> spark.table("nums") res22: org.apache.spark.sql.DataFrame = [num: int]  scala> spark.table("nums").cache res23: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [num: int]  scala> spark.table("nums").count res24: Long = 3 

The data is cached fully only after the .count call. Here's proof it's been cached:

Cached nums temp view/table

Related SO: spark createOrReplaceTempView vs createGlobalTempView

Relevant quote (comparing to persistent table): "Unlike the createOrReplaceTempView command, saveAsTable will materialize the contents of the DataFrame and create a pointer to the data in the Hive metastore." from https://spark.apache.org/docs/latest/sql-programming-guide.html#saving-to-persistent-tables

Note : createOrReplaceTempView was formerly registerTempTable

like image 136
Garren S Avatar answered Sep 29 '22 00:09

Garren S


CreateOrReplaceTempView will create a temporary view of the table on memory it is not persistent at this moment but you can run SQL query on top of that. if you want to save it you can either persist or use saveAsTable to save.

First, we read data in .csv format and then convert to data frame and create a temp view

Reading data in .csv format

val data = spark.read.format("csv").option("header","true").option("inferSchema","true").load("FileStore/tables/pzufk5ib1500654887654/campaign.csv") 

Printing the schema

data.printSchema 

SchemaOfTable

data.createOrReplaceTempView("Data") 

Now we can run SQL queries on top of the table view we just created

  %sql SELECT Week AS Date, Campaign Type, Engagements, Country FROM Data ORDER BY Date ASC 

enter image description here

like image 31
RajenDharmendra Avatar answered Sep 28 '22 22:09

RajenDharmendra