Is a table registered with <code>registerTempTable</code> (<code>createOrReplaceTempView</code> with spark 2.+) cached? Using Zeppelin, I register a <code>DataFrame</code> in my scala code, after heavy computation, and then within <code>%pyspark</code> I want to access it, and further filter it. Will it use a memory-cached version of the table? Or will it be rebuilt each time?

Registered tables are not cached in memory. The <strike><code>registerTempTable</code></strike> <code>createOrReplaceTempView</code> method will just create or replace a view of the given <code>DataFrame</code> with a given query plan. It will convert the query plan to canonicalized SQL string, and store it as view text in metastore, if we need to create a permanent view. You'll need to cache your DataFrame explicitly. e.g : <pre class="prettyprint"><code>df.createOrReplaceTempView("my_table") # df.registerTempTable("my_table") for spark <2.+ spark.cacheTable("my_table") </code></pre> EDIT: Let's illustrate this with an example : Using <code>cacheTable</code> : <pre class="prettyprint"><code>scala> val df = Seq(("1",2),("b",3)).toDF // df: org.apache.spark.sql.DataFrame = [_1: string, _2: int] scala> sc.getPersistentRDDs // res0: scala.collection.Map[Int,org.apache.spark.rdd.RDD[_]] = Map() scala> df.createOrReplaceTempView("my_table") scala> sc.getPersistentRDDs // res2: scala.collection.Map[Int,org.apache.spark.rdd.RDD[_]] = Map() scala> spark.catalog.cacheTable("my_table") // spark.cacheTable("...") before spark 2.0 scala> sc.getPersistentRDDs // res4: scala.collection.Map[Int,org.apache.spark.rdd.RDD[_]] = Map(2 -> In-memory table my_table MapPartitionsRDD[2] at cacheTable at <console>:26) </code></pre> Now the same example using <strike><code>cache.registerTempTable</code></strike> <code>cache.createOrReplaceTempView</code> : <pre class="prettyprint"><code>scala> sc.getPersistentRDDs // res2: scala.collection.Map[Int,org.apache.spark.rdd.RDD[_]] = Map() scala> val df = Seq(("1",2),("b",3)).toDF // df: org.apache.spark.sql.DataFrame = [_1: string, _2: int] scala> df.createOrReplaceTempView("my_table") scala> sc.getPersistentRDDs // res4: scala.collection.Map[Int,org.apache.spark.rdd.RDD[_]] = Map() scala> df.cache.createOrReplaceTempView("my_table") scala> sc.getPersistentRDDs // res6: scala.collection.Map[Int,org.apache.spark.rdd.RDD[_]] = // Map(2 -> ConvertToUnsafe // +- LocalTableScan [_1#0,_2#1], [[1,2],[b,3]] // MapPartitionsRDD[2] at cache at <console>:28) </code></pre>

Temp table caching with spark-sql

1 Answers

Registered tables are not cached in memory.

The ~~registerTempTable~~ createOrReplaceTempView method will just create or replace a view of the given DataFrame with a given query plan.

It will convert the query plan to canonicalized SQL string, and store it as view text in metastore, if we need to create a permanent view.

You'll need to cache your DataFrame explicitly. e.g :

df.createOrReplaceTempView("my_table") # df.registerTempTable("my_table") for spark <2.+
spark.cacheTable("my_table")

EDIT:

Let's illustrate this with an example :

Using cacheTable :

scala> val df = Seq(("1",2),("b",3)).toDF
// df: org.apache.spark.sql.DataFrame = [_1: string, _2: int]

scala> sc.getPersistentRDDs
// res0: scala.collection.Map[Int,org.apache.spark.rdd.RDD[_]] = Map()

scala> df.createOrReplaceTempView("my_table")

scala> sc.getPersistentRDDs
// res2: scala.collection.Map[Int,org.apache.spark.rdd.RDD[_]] = Map()

scala> spark.catalog.cacheTable("my_table") // spark.cacheTable("...") before spark 2.0

scala> sc.getPersistentRDDs
// res4: scala.collection.Map[Int,org.apache.spark.rdd.RDD[_]] = Map(2 -> In-memory table my_table MapPartitionsRDD[2] at cacheTable at <console>:26)

Now the same example using ~~cache.registerTempTable~~ cache.createOrReplaceTempView :

scala> sc.getPersistentRDDs
// res2: scala.collection.Map[Int,org.apache.spark.rdd.RDD[_]] = Map()

scala> val df = Seq(("1",2),("b",3)).toDF
// df: org.apache.spark.sql.DataFrame = [_1: string, _2: int]

scala> df.createOrReplaceTempView("my_table")

scala> sc.getPersistentRDDs
// res4: scala.collection.Map[Int,org.apache.spark.rdd.RDD[_]] = Map()

scala> df.cache.createOrReplaceTempView("my_table")

scala> sc.getPersistentRDDs
// res6: scala.collection.Map[Int,org.apache.spark.rdd.RDD[_]] = 
// Map(2 -> ConvertToUnsafe
// +- LocalTableScan [_1#0,_2#1], [[1,2],[b,3]]
//  MapPartitionsRDD[2] at cache at <console>:28)

172

answered Sep 18 '22 16:09

eliasah

Related questions
                            
                                Visual Studio "Search Solution Explorer" is disabled
                            
                                Redux-form: display a list of errors on top of a page
                            
                                How to upgrade php version on Windows 10
                            
                                ssl on custom domain for heroku app
                            
                                In a type trait, why do people use enum rather than static const for the value?
                            
                                node-postgres vs pg-promise for Nodejs Application
                            
                                Visual Studio Team Services Release/Deploy fails - "No package found with specified pattern"
                            
                                Angular2 Quickstart Tutorial Breaking Karma Tests - "Can't bind to 'ngModel' since it isn't a known property of 'input'."
                            
                                How to avoid {{expr}} flash to display on page before Vue.js take over?
                            
                                golang force http request to specific ip (similar to curl --resolve)
                            
                                How to avoid warning about no return expression when using static_assert?
                            
                                Enzyme is not finding component by props

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Temp table caching with spark-sql

Tags:

Cedric H.

People also ask

1 Answers

eliasah

Recent Activity

Donate For Us