Dask, create a dataframe from several dask arrays

Tags:

python

dask

Suppose I have a set of dask arrays such as:

c1 = da.from_array(np.arange(100000, 190000), chunks=1000)
c2 = da.from_array(np.arange(200000, 290000), chunks=1000)
c3 = da.from_array(np.arange(300000, 390000), chunks=1000)

is it possible to create a dask dataframe from them? In pandas i could say:

data = {}
data['c1'] = c1
data['c2'] = c2
data['c3'] = c3

df = pd.DataFrame(data)

is there a similar way to do this with dask?

870

asked Mar 28 '17 01:03

Jason Solack

1 Answers

The following should work:

import pandas as pd, numpy as np 
import dask.array as da, dask.dataframe as dd

c1 = da.from_array(np.arange(100000, 190000), chunks=1000)
c2 = da.from_array(np.arange(200000, 290000), chunks=1000)
c3 = da.from_array(np.arange(300000, 390000), chunks=1000)

# generate dask dataframe
ddf = dd.concat([dd.from_dask_array(c) for c in [c1,c2,c3]], axis = 1) 
# name columns
ddf.columns = ['c1', 'c2', 'c3']

152

answered Sep 21 '22 21:09

Arco Bast

Related questions
                            
                                Python vlc install problems
                            
                                Pandas - Change AM/PM format to 24h
                            
                                Selecting the first row of a sorted group from pandas data frame
                            
                                PyLint bad-whitespace Configuration
                            
                                How Can I install Twisted + Scrapy on Python3.6 and CentOs
                            
                                How to pass custom settings through CrawlerProcess in scrapy?
                            
                                APScheduler - ImportError: No module named 'apscheduler'
                            
                                PyQt5 "Timers cannot be started from another thread" error when changing size of QLabel
                            
                                Increase the speed of redrawing contour plot in matplotlib
                            
                                ValueError: Dimensions must be equal, but are 784 and 500 for 'Mul' (op: 'Mul') with input shapes: [?,784], [784,500]
                            
                                Python: urllib.error.HTTPError: HTTP Error 404: Not Found
                            
                                CPU instructions not compiled with TensorFlow
                            
                                Matplotlib Scatter plot with numpy row index as marker
                            
                                How to combine every element of a list to the other list? [duplicate]
                            
                                I want itertools to return a list of lists
                            
                                How to perform a Django test with a request.post?
                            
                                Flask Access-Control-Allow-Origin for multiple URLs
                            
                                Removing rows after a certain string in pandas
                            
                                CMake override policy for subproject
                            
                                hadoop, python, subprocess failed with code 127

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With