How can I transform my resulting dask.DataFrame into pandas.DataFrame (let's say I am done with heavy lifting, and just want to apply sklearn to my aggregate result)?
Pandas does most of the things pretty well but screws in quite a few. Dask does not support these two things as well. There are numerous other things for which you'll have to use pandas.
Dask runs faster than pandas for this query, even when the most inefficient column type is used, because it parallelizes the computations. pandas only uses 1 CPU core to run the query. My computer has 4 cores and Dask uses all the cores to run the computation.
You can call the .compute() method to transform a dask.dataframe to a pandas dataframe:
df = df.compute()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With