Suppose I have pandas dataframe as:
df=pd.DataFrame({'a':[1,2,3],'b':[4,5,6]})
When I convert it into dask dataframe what should name
and divisions
parameter consist of:
from dask import dataframe as dd
sd=dd.DataFrame(df.to_dict(),divisions=1,meta=pd.DataFrame(columns=df.columns,index=df.index))
TypeError: init() missing 1 required positional argument: 'name'
Edit : Suppose I create a pandas dataframe like:
pd.DataFrame({'a':[1,2,3],'b':[4,5,6]})
Similarly how to create dask dataframe as it needs three additional arguments as name,divisions
and meta
.
sd=dd.Dataframe({'a':[1,2,3],'b':[4,5,6]},name=,meta=,divisions=)
Thank you for your reply.
A Dask DataFrame is a large parallel DataFrame composed of many smaller pandas DataFrames, split along the index. These pandas DataFrames may live on disk for larger-than-memory computing on a single machine, or on many different machines in a cluster.
The original pandas query took 182 seconds and the optimized Dask query took 19 seconds, which is about 10 times faster. Dask can provide performance boosts over pandas because it can execute common operations in parallel, where pandas is limited to a single core.
I think you can use dask.dataframe.from_pandas
:
from dask import dataframe as dd
sd = dd.from_pandas(df, npartitions=3)
print (sd)
dd.DataFrame<from_pa..., npartitions=2, divisions=(0, 1, 2)>
EDIT:
I find solution:
import pandas as pd
import dask.dataframe as dd
from dask.dataframe.utils import make_meta
df=pd.DataFrame({'a':[1,2,3],'b':[4,5,6]})
dsk = {('x', 0): df}
meta = make_meta({'a': 'i8', 'b': 'i8'}, index=pd.Index([], 'i8'))
d = dd.DataFrame(dsk, name='x', meta=meta, divisions=[0, 1, 2])
print (d)
dd.DataFrame<x, npartitions=2, divisions=(0, 1, 2)>
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With