How can I check whether my RDD or dataframe is cached or not?

Tags:

I have created a dataframe say df1. I cached this by using df1.cache(). How can I check whether this has been cached or not? Also is there a way so that I am able to see all my cached RDD's or dataframes.

906

asked Sep 07 '15 07:09

Arnab

2 Answers

You can call getStorageLevel.useMemory on the Dataframe and the RDD to find out if the dataset is in memory.

For the Dataframe do this:

scala> val df = Seq(1, 2).toDF()
df: org.apache.spark.sql.DataFrame = [value: int]

scala> df.storageLevel.useMemory
res1: Boolean = false

scala> df.cache()
res0: df.type = [value: int]

scala> df.storageLevel.useMemory
res1: Boolean = true

For the RDD do this:

scala> val rdd = sc.parallelize(Seq(1,2))
rdd: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[1] at parallelize at <console>:21

scala> rdd.getStorageLevel.useMemory
res9: Boolean = false

scala> rdd.cache()
res10: rdd.type = ParallelCollectionRDD[1] at parallelize at <console>:21

scala> rdd.getStorageLevel.useMemory
res11: Boolean = true

196

answered Oct 16 '22 20:10

Patrick McGloin

@Arnab,

Did you find the function in Python?
Here is an example for DataFrame DF:

DF.cache()
print DF.is_cached

Hope this helps.
Ram

answered Oct 16 '22 22:10

user6296218

Related questions
                            
                                React onClick event
                            
                                What is the default session timeout and how to configure it when using the Spring Session with Redis as the backend
                            
                                Insert elements to beginning and end of numpy array
                            
                                Custom value for list_display item on Django admin
                            
                                Match Canvas with Main Camera - Unity
                            
                                How can I AutoSave my Visual Studio 2015 files when it loses focus?
                            
                                How do I disable the ssl check in python 3.x?
                            
                                json.load() function give strange 'UnicodeDecodeError: 'ascii' codec can't decode' error
                            
                                How to map a dictionary in reactJS?
                            
                                File Upload in Elm
                            
                                Cross field validation in Angular2
                            
                                Liquibase error [Postgresql]: unterminated dollar-quoted string at or near "$BODY$

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With