I am working with extremely large datasets, so I need to remove any intermediate dataframe. How do I ensure that any dataframe that I don't need is deleted from memory/disk?
You should use spark.catalog.clearCache
https://spark.apache.org/docs/2.1.0/api/java/org/apache/spark/sql/catalog/Catalog.html
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With