Given a pandas df
one can copy it before doing anything via:
df.copy()
How can I do this with a dask dataframe object?
Mutation on dask.dataframe objects is rare, so this is rarely necessary.
That being said, you can safely just copy the object
from copy import copy
df2 = copy(df)
No dask.dataframe operation mutates any of the fields of the dataframe, so this is sufficient.
Dask creates internal pipelines of lazy computations. Every version of your dataframe is another layer of computations which are not computed until later.
You can branch from these computations by either copying it like @MRocklin suggests, then you're working on a brand new stack of computations, or you can continue on the same stack by doing:
df = df[df.columns]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With