when i am trying to create a dask dataframe by reading a pickle file , iam getting an error
import dask.dataframe as dd
ds_df = dd.read_pickle("D:\test.pickle")
AttributeError: 'module' object has no attribute 'read_pickle'
but it works fine with read_csv
And in pandas it was successful as usual.
So please correct me if i am doing something wrong there or in dask we can't create dataframe by reading a pickle file at all.
Please note that dask.dataframe does not fully implement Pandas. You should not expect every pandas operations to have an analog in dask.dataframe.
We haven't chosen to implement reading from pickle files in particular because there is no way to read only part of a pickle file; everything gets dumped into memory at once. Because of this, pickle files don't have much value when it comes to reading large datasets piece by piece from disk.
If you're just looking for parallelism then I recommend using pandas.read_pickle
along with dask.dataframe.from_pandas
df = pd.read_pickle(...)
ddf = dd.from_pandas(df, npartitions=8)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With