Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

creating dask dataframe by reading a pickle file in dask module of Python

Tags:

python

dask

when i am trying to create a dask dataframe by reading a pickle file , iam getting an error

import dask.dataframe as dd
ds_df = dd.read_pickle("D:\test.pickle")

AttributeError: 'module' object has no attribute 'read_pickle'

but  it works fine with read_csv

And in pandas it was successful as usual.

So please correct me if i am doing something wrong there or in dask we can't create dataframe by reading a pickle file at all.

like image 562
Satya Avatar asked Dec 14 '15 09:12

Satya


1 Answers

Please note that dask.dataframe does not fully implement Pandas. You should not expect every pandas operations to have an analog in dask.dataframe.

We haven't chosen to implement reading from pickle files in particular because there is no way to read only part of a pickle file; everything gets dumped into memory at once. Because of this, pickle files don't have much value when it comes to reading large datasets piece by piece from disk.

If you're just looking for parallelism then I recommend using pandas.read_pickle along with dask.dataframe.from_pandas

df = pd.read_pickle(...)
ddf = dd.from_pandas(df, npartitions=8)
like image 87
MRocklin Avatar answered Sep 19 '22 15:09

MRocklin