Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Does registerTempTable cause the table to get cached?

I have a sql statement query which is doing a group by on many fields. The tables that it uses is also big (4TB in size). I'm registering the table as a temp table. However I don't know whether the table gets cached or not when I'm registering it as a temp table? I also don't know whether it is more performant if I convert my query into Scala function (e.g. df.groupby().aggr()...) rather than having it as a sql statement. Any help on that?

like image 552
HHH Avatar asked Nov 07 '16 21:11

HHH


2 Answers

SQL is most likely going to be the fastest by far Databricks blog

Did you try to partition/repartition your dataframe as well to see whether it improves the performance?

Regarding registerTempTable: it only registers the table within a spark context. You can check with the UI.

val test = List((1,2,3),(4,5,6)).toDF("bla","blb","blc")
test.createOrReplaceTempView("test")
test.show()

Storage is blank

vs

val test = List((1,2,3),(4,5,6)).toDF("bla","blb","blc")
test.createOrReplaceTempView("test").cache()
test.show()

enter image description here

by the way registerTempTable is deprecated in Spark 2.0 and has been replaced by

createOrReplaceTempView

like image 196
ulrich Avatar answered Nov 04 '22 23:11

ulrich


I have a sql statement query which is doing a group by on many fields. The tables that it uses is also big (4TB in size). I'm registering the table as a temp table. However I don't know whether the table gets cached or not when I'm registering it as a temp table?

The registerTempTabele or createOrReplaceTempView doesn't cache the data into memory or disc itself unless you use cache() function.

I also don't know whether it is more performant if I convert my query into Scala function (e.g. df.groupby().aggr()...) rather than having it as a sql statement. Any help on that?

Keep in mind the sql terms in sql query ultimately call the function inside. so whether you use sql query terms or functions available in code it doesn't matter. that is same thing.

like image 28
Ashis Parajuli Avatar answered Nov 04 '22 23:11

Ashis Parajuli