cache a dataframe in pyspark

Question

I want to know more precisely about the use of the method cache for dataframe in pyspark

When I run df.cache() it returns a dataframe. Therefore, if I do df2 = df.cache(), which dataframe is in cache ? Is it df, df2, or both ?

Steven · Accepted Answer

I found the source code DataFrame.cache

def cache(self):
    """Persists the :class:`DataFrame` with the default storage level (`MEMORY_AND_DISK`).

    .. note:: The default storage level has changed to `MEMORY_AND_DISK` to match Scala in 2.0.
    """
    self.is_cached = True
    self._jdf.cache()
    return self

Therefore, the answer is : both

cache a dataframe in pyspark

Tags:

caching

pyspark

Steven

1 Answers

Steven

Recent Activity

Donate For Us

cache a dataframe in pyspark

Tags:

caching

pyspark

Steven

1 Answers

Steven

Related questions

Recent Activity

Donate For Us