I have a Spark/Scala job in which I do this:
df1 + cache it into memorydf1 to compute dfA
df2 (again, its big) + cache itWhen performing (3), I do no longer need df1. I want to make sure its space gets freed. I cached at (1) because this DataFrame gets used in (2) and its the only way to make sure I do not recompute it each time but only once.
I need to free its space and make sure it gets freed. What are my options?
I thought of these, but it doesn't seem to be sufficient:
df=nulldf.unpersist()Can you document your answer with a proper Spark documentation link?
df.unpersist should be sufficient, but it won't necessarily free it right away. It merely marks the dataframe for removal.
You can use df.unpersist(blocking = true) which will block until the dataframe is removed before continuing on.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With