How to make sure my DataFrame frees its memory?

Question

I have a Spark/Scala job in which I do this:

1: Compute a big DataFrame df1 + cache it into memory
2: Use df1 to compute dfA
3: Read raw data into df2 (again, its big) + cache it

When performing (3), I do no longer need df1. I want to make sure its space gets freed. I cached at (1) because this DataFrame gets used in (2) and its the only way to make sure I do not recompute it each time but only once.

I need to free its space and make sure it gets freed. What are my options?

I thought of these, but it doesn't seem to be sufficient:

df=null
df.unpersist()

Can you document your answer with a proper Spark documentation link?

puhlen · Accepted Answer

df.unpersist should be sufficient, but it won't necessarily free it right away. It merely marks the dataframe for removal.

You can use df.unpersist(blocking = true) which will block until the dataframe is removed before continuing on.

How to make sure my DataFrame frees its memory?

Tags:

garbage-collection

scala

apache-spark

spark-dataframe

belka

1 Answers

puhlen

Recent Activity

Donate For Us

How to make sure my DataFrame frees its memory?

Tags:

garbage-collection

scala

apache-spark

spark-dataframe

belka

1 Answers

puhlen

Related questions

Recent Activity

Donate For Us