Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Dask, create a dataframe from several dask arrays

Tags:

python

dask

Suppose I have a set of dask arrays such as:

c1 = da.from_array(np.arange(100000, 190000), chunks=1000)
c2 = da.from_array(np.arange(200000, 290000), chunks=1000)
c3 = da.from_array(np.arange(300000, 390000), chunks=1000)

is it possible to create a dask dataframe from them? In pandas i could say:

data = {}
data['c1'] = c1
data['c2'] = c2
data['c3'] = c3

df = pd.DataFrame(data)

is there a similar way to do this with dask?

like image 870
Jason Solack Avatar asked Mar 28 '17 01:03

Jason Solack


People also ask

Is Dask merge faster than pandas?

Using dask instead of pandas to merge large data sets The python package dask is a powerful python package that allows you to do data analytics in parallel which means it should be faster and more memory efficient than pandas .

Is Dask faster than Numpy?

That's where Dask arrays provide much more flexibility than Numpy. They enable you to work with larger-than-memory objects, and computation time is significantly faster due to parallelization.


1 Answers

The following should work:

import pandas as pd, numpy as np 
import dask.array as da, dask.dataframe as dd

c1 = da.from_array(np.arange(100000, 190000), chunks=1000)
c2 = da.from_array(np.arange(200000, 290000), chunks=1000)
c3 = da.from_array(np.arange(300000, 390000), chunks=1000)

# generate dask dataframe
ddf = dd.concat([dd.from_dask_array(c) for c in [c1,c2,c3]], axis = 1) 
# name columns
ddf.columns = ['c1', 'c2', 'c3']
like image 152
Arco Bast Avatar answered Sep 21 '22 21:09

Arco Bast