Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how to drop dataframes from pyspark to manage memory?

I list my dataframes to drop unused ones. First I used below function to list dataframes that I found from one of the post

from pyspark.sql import DataFrame

def list_dataframes():

    return [k for (k, v) in globals().items() if isinstance(v, DataFrame)] 

Then I tried to drop unused ones from the list. Code I used below

df2.unpersist()

When I list again df2 is still there. How can I drop the dataframes to have some memory on pyspark? or do you have any other suggestions? Thank you.

like image 240
melik Avatar asked Jan 27 '23 05:01

melik


1 Answers

If you are using the latest or older spark you can use df.unpersist() to achieve the same but the older versions have a bug which is fixed in the latest version(2.3.2) of spark where its not updating the storage memory stats it works but its not updating the stats so i request you to run it on the latest spark to see the stats difference

Refer the below link to know more about this

unpersist() issue

ReleaseNote for 2.3.2

Please do approve the answer if useful.

like image 50
Sundeep Pidugu Avatar answered Jan 29 '23 18:01

Sundeep Pidugu