I list my dataframes to drop unused ones. First I used below function to list dataframes that I found from one of the post
from pyspark.sql import DataFrame
def list_dataframes():
return [k for (k, v) in globals().items() if isinstance(v, DataFrame)]
Then I tried to drop unused ones from the list. Code I used below
df2.unpersist()
When I list again df2 is still there. How can I drop the dataframes to have some memory on pyspark? or do you have any other suggestions? Thank you.
If you are using the latest or older spark you can use df.unpersist() to achieve the same but the older versions have a bug which is fixed in the latest version(2.3.2) of spark where its not updating the storage memory stats it works but its not updating the stats so i request you to run it on the latest spark to see the stats difference
Refer the below link to know more about this
unpersist() issue
ReleaseNote for 2.3.2
Please do approve the answer if useful.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With