Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

clearCache in pyspark without SQLContext

Considering the pySpark documentation for SQLContext says "As of Spark 2.0, this is replaced by SparkSession."

How can I remove all cached tables from the in-memory cache without using SQLContext?

For example, where spark is a SparkSession and sc is a sparkContext:

from pyspark.sql import SQLContext
SQLContext(sc, spark).clearCache()
like image 379
Clay Avatar asked Jun 06 '26 02:06

Clay


1 Answers

I don't think that clearCache is available elsewhere except SQLContext in pyspark. The example below create an instance using SQLContext.getOrCreate using an existing SparkContext instance:

SQLContext.getOrCreate(sc).clearCache()

In scala though there is an easier way to achieve the same directly via SparkSession:

spark.sharedState.cacheManager.clearCache()

One more option through the catalog as Clay mentioned:

spark.catalog.clearCache

And the last one from Jacek Laskowski's gitbooks:

spark.sql("CLEAR CACHE").collect

Reference: https://jaceklaskowski.gitbooks.io/mastering-spark-sql/spark-sql-caching-and-persistence.html

like image 182
abiratsis Avatar answered Jun 08 '26 16:06

abiratsis