Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Howto copy a dask dataframe?

Tags:

python

dask

Given a pandas df one can copy it before doing anything via:

df.copy()

How can I do this with a dask dataframe object?

like image 597
Michael Avatar asked Aug 03 '16 11:08

Michael


2 Answers

Mutation on dask.dataframe objects is rare, so this is rarely necessary.

That being said, you can safely just copy the object

from copy import copy
df2 = copy(df)

No dask.dataframe operation mutates any of the fields of the dataframe, so this is sufficient.

like image 141
MRocklin Avatar answered Oct 13 '22 00:10

MRocklin


Dask creates internal pipelines of lazy computations. Every version of your dataframe is another layer of computations which are not computed until later.

You can branch from these computations by either copying it like @MRocklin suggests, then you're working on a brand new stack of computations, or you can continue on the same stack by doing:

df = df[df.columns]
like image 41
André C. Andersen Avatar answered Oct 12 '22 23:10

André C. Andersen