Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Caching DataFrame in Spark Thrift Server

I have a Spark Thrift Server. I connect to the Thrift Server and get data of Hive table. If I query the same table again, it will again load the file in memory and execute the query.

Is there any way I can cache the table data using Spark Thrift Server? If yes, please let me know how to do it

like image 317
Aditya Calangutkar Avatar asked Dec 31 '25 17:12

Aditya Calangutkar


1 Answers

Two things:

  • use CACHE LAZY TABLE as in this answer: Spark SQL: how to cache sql query result without using rdd.cache() and cache tables in apache spark sql
  • use spark.sql.hive.thriftServer.singleSession=true so that other clients can use this cached table.

Remember that caching is lazy, so it will be cached during first computation

like image 67
T. Gawęda Avatar answered Jan 04 '26 15:01

T. Gawęda



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!