Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

spark createOrReplaceTempView vs createGlobalTempView

Spark Dataset 2.0 provides two functions createOrReplaceTempView and createGlobalTempView. I am not able to understand the basic difference between both functions.

According to API documents:

createOrReplaceTempView: The lifetime of this temporary view is tied to the [[SparkSession]] that was used to create this Dataset.
So, when I call sparkSession.close() the defined will be destroyed. is it true?

createGlobalTempView: The lifetime of this temporary view is tied to this Spark application.

when this type of view will be destroyed? any example. like sparkSession.close()?

like image 950
Rahul Sharma Avatar asked Mar 13 '17 21:03

Rahul Sharma


People also ask

What is the difference between createOrReplaceTempView and Createglobaltempview?

createOrReplaceTempView has been introduced in Spark 2.0 to replace registerTempTable. CreateTempView creates an in-memory reference to the Dataframe in use. The lifetime for this depends on the spark session in which the Dataframe was created in.

Why do we use createOrReplaceTempView?

createorreplacetempview is used when you desire to store the table for a specific spark session. createorreplacetempview creates (or replaces if that view name already exists) a lazily evaluated "view" that you can then use like a hive table in Spark SQL.

What is global temp view Databricks?

[GLOBAL] TEMPORARYAll the global temporary views are tied to a system preserved temporary database global_temp . The database name is preserved, and thus, users are not allowed to create/use/drop this database. You must use the qualified name to access the global temporary view.

What is global temp view in spark?

Global Temporary ViewTemporary views in Spark SQL are session-scoped and will disappear if the session that creates it terminates. If you want to have a temporary view that is shared among all sessions and keep alive until the Spark application terminates, you can create a global temporary view.

What is createorreplacetempview in spark?

createGlobalTempView () creates a global temporary view with this dataframe df. life time of this view is dependent to spark application itself. If you want to drop : Show activity on this post. createOrReplaceTempView has been introduced in Spark 2.0 to replace registerTempTable.

What is createglobaltempview in spark?

Global temporary views are introduced in Spark 2.1.0 release. This feature is useful when you want to share data among different sessions and keep alive until your application ends.Please see a shot sample I wrote to illustrate the use for createTempView and createGlobalTempView

How do I create a temporary view in spark?

So if you are comfortable with SQL, you can create a temporary view on DataFrame/Dataset by using createOrReplaceTempView () and using SQL to select and manipulate the data. A Temporary view in Spark is similar to a real SQL table that contains rows and columns but the view is not materialized into files.

What is Microsoft's warranty on the temporary views in spark?

Microsoft makes no warranties, express or implied, with respect to the information provided here. Creates or replaces a global temporary view using the given name. The lifetime of this temporary view is tied to this Spark application.


3 Answers

The Answer to your questions is basically understanding the difference of a Spark Application and a Spark Session.

Spark application can be used:

  • for a single batch job
  • an interactive session with multiple jobs
  • a long-lived server continually satisfying requests
  • A Spark job can consist of more than just a single map and reduce.
  • A Spark Application can consist of more than one session

A SparkSession on the other hand is associated to a Spark Application:

  • Generally, a session is an interaction between two or more entities.
  • in Spark 2.0 you can use SparkSession
  • A SparkSession can be created without creating SparkConf, SparkContext or SQLContext, (they’re encapsulated within the SparkSession)

Global temporary views are introduced in Spark 2.1.0 release. This feature is useful when you want to share data among different sessions and keep alive until your application ends.Please see a shot sample I wrote to illustrate the use for createTempView and createGlobalTempView

object NewSessionApp {

  def main(args: Array[String]): Unit = {

    val logFile = "data/README.md" // Should be some file on your system
    val spark = SparkSession.
      builder.
      appName("Simple Application").
      master("local").
      getOrCreate()

    val logData = spark.read.textFile(logFile).cache()
    logData.createGlobalTempView("logdata")
    spark.range(1).createTempView("foo")

    // within the same session the foo table exists 
    println("""spark.catalog.tableExists("foo") = """ + spark.catalog.tableExists("foo"))
    //spark.catalog.tableExists("foo") = true

    // for a new session the foo table does not exists
    val newSpark = spark.newSession
    println("""newSpark.catalog.tableExists("foo") = """ + newSpark.catalog.tableExists("foo"))
    //newSpark.catalog.tableExists("foo") = false

    //both session can access the logdata table
    spark.sql("SELECT * FROM global_temp.logdata").show()
    newSpark.sql("SELECT * FROM global_temp.logdata").show()

    spark.stop()
  }
}
like image 194
Avi Chalbani Avatar answered Oct 26 '22 05:10

Avi Chalbani


df.createOrReplaceTempView("tempViewName")
df.createGlobalTempView("tempViewName")

createOrReplaceTempView() creates or replaces a local temporary view with this dataframe df. Lifetime of this view is dependent to SparkSession class, is you want to drop this view :

spark.catalog.dropTempView("tempViewName")

or stop() will shutdown the session

self.ss = SparkSession(sc)
...
self.ss.stop()

createGlobalTempView() creates a global temporary view with this dataframe df. life time of this view is dependent to spark application itself. If you want to drop :

spark.catalog.dropGlobalTempView("tempViewName")

or stop() will shutdown

ss =  SparkContext(conf=conf, ......)
...
ss.stop()
like image 44
Gökhan Ayhan Avatar answered Oct 26 '22 05:10

Gökhan Ayhan


createOrReplaceTempView has been introduced in Spark 2.0 to replace registerTempTable. CreateTempView creates an in-memory reference to the Dataframe in use. The lifetime for this depends on the spark session in which the Dataframe was created in. createGlobalTempView, on the other hand, allows you to create the references that can be used across spark sessions. So depending upon whether you need to share data across sessions, you can use either of the methods. By default, the notebooks in the same cluster share the same spark session, but there is an option to set up clusters where each notebook has its own session. So all it boils down to is that where do you create the data frame and where do you want to access it.

like image 4
WolfBlunt Avatar answered Oct 26 '22 06:10

WolfBlunt