Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Spark Sql registerTempTable and registerDataFrameAsTable difference

what is difference between registerTempTable and registerDataFrameAsTable method in Spark SQL and which is better in which scenario.

like image 256
mandar Avatar asked Jul 24 '15 04:07

mandar


1 Answers

Spark >= 2.1

There is a new createGlobalTempView which can used to register cross-session views:

Its lifetime is the lifetime of the Spark application, i.e. it will be automatically dropped when the application terminates. It's tied to a system preserved database _global_temp, and we must use the qualified name to refer a global temp view, e.g. SELECT * FROM _global_temp.view1.

Spark >= 2.0

registerTempTable has been deprecated in favor of createTempView and createOrReplaceTempView with the former one throwing and exception if the view already exists.

Spark < 2.0

PySpark

While there is no practical difference between these two there is a difference in execution strategy:

  • SQLContext.registerDataFrameAsTable calls registerDataFrameAsTable method on JVM SQL context
  • DataFrame.registerTempTable calls registerTempTable on JVM data frame, which calls registerDataFrameAsTable method on JVM SQL context

Scala

  • DataFrame.registerTempTable calls registerDataFrameAsTable method on SQL context
  • SQLContext.registerDataFrameAsTable is a private method not accessible outside org.apache.spark.sql package.

To keep things simple it is probably a good idea to stick to registerTempTable as well.

like image 52
zero323 Avatar answered Oct 24 '22 23:10

zero323