<p>I have an instance of org.apache.spark.rdd.RDD[MyClass]. How can I programmatically check if the instance is persist\inmemory?</p>

<p>You can call rdd.getStorageLevel.useMemory to check if it is in memory or not as follows:</p> <pre class="prettyprint"><code>scala> myrdd.getStorageLevel.useMemory res3: Boolean = false scala> myrdd.cache() res4: myrdd.type = MapPartitionsRDD[2] at filter at <console>:29 scala> myrdd.getStorageLevel.useMemory res5: Boolean = true </code></pre>

How to check if Spark RDD is in memory?

2 Answers

You want RDD.getStorageLevel. It will return StorageLevel.None if empty. However that is only if it is marked for caching or not. If you want the actual status you can use the developer api sc.getRDDStorageInfo or sc.getPersistentRDD

127

answered Nov 10 '22 13:11

Justin Pihony

You can call rdd.getStorageLevel.useMemory to check if it is in memory or not as follows:

scala> myrdd.getStorageLevel.useMemory
res3: Boolean = false

scala> myrdd.cache()
res4: myrdd.type = MapPartitionsRDD[2] at filter at <console>:29

scala> myrdd.getStorageLevel.useMemory
res5: Boolean = true

answered Nov 10 '22 12:11

KayV

Related questions
                            
                                Getting labels from StringIndexer stages within pipeline in Spark (pyspark)
                            
                                How to convert pyspark.rdd.PipelinedRDD to Data frame with out using collect() method in Pyspark?
                            
                                Spark streaming with python: how to add a UUID column?
                            
                                Difference between batch interval, sliding interval and window size in spark streaming
                            
                                Failed to find data source: com.mongodb.spark.sql.DefaultSource
                            
                                Can I tell spark.read.json that my files are gzipped?
                            
                                How to use spark-avro package to read avro file from spark-shell?
                            
                                Enriching SparkContext without incurring in serialization issues
                            
                                spark reading large file
                            
                                Using Silhouette Clustering in Spark
                            
                                Convert value depending on a type in SparkSQL via case matching of type
                            
                                How to flatten nested lists in PySpark?
                            
                                How to force Spark to evaluate DataFrame operations inline
                            
                                Run Command on EMR Slaves?
                            
                                How does Spark manage stages?
                            
                                What row is used in dropDuplicates operator?
                            
                                Create an empty array column of certain type in pyspark DataFrame
                            
                                Ignoring non-spark config property: hive.exec.dynamic.partition.mode
                            
                                How to CREATE TABLE USING delta with Spark 2.4.4?
                            
                                Write and read raw byte arrays in Spark - using Sequence File SequenceFile

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to check if Spark RDD is in memory?

Tags:

apache-spark

rdd

in-memory

Dmitry Petrov

People also ask

2 Answers

Justin Pihony

KayV

Recent Activity

Donate For Us