add a dask.array column to a dask.dataframe

Tags:

I have a dask dataframe and a dask array with the same number of rows in the same logical order. The dataframe rows are indexed by strings. I am trying to add one of the array columns to the dataframe. I have tried several ways all of which failed in their particular way.

df['col'] = da.col
# TypeError: Column assignment doesn't support type Array

df['col'] = da.to_frame(columns='col')
# TypeError: '<' not supported between instances of 'str' and 'int'

df['col'] = da.to_frame(columns=['col']).set_index(df.col).col
# TypeError: '<' not supported between instances of 'str' and 'int'

df = df.reset_index()
df['col'] = da.to_frame(columns='col')
# ValueError: Not all divisions are known, can't align partitions. Please use `set_index` to set the index.

and a few other variants.

What is the right way to add a dask array column to a dask dataframe when the structures are logically compatible?

899

asked Jan 08 '18 21:01

Daniel Mahler

1 Answers

This does seem to work as of dask version 2021.4.0, and possibly earlier. Just make sure the number of dataframe partitions matches the number of array chunks.

import dask.array as da
import dask.dataframe as dd
import numpy as np
import pandas as pd
ddf = dd.from_pandas(pd.DataFrame({'z': np.arange(100, 104)}),
                     npartitions=2)
ddf['a'] = da.arange(200,204, chunks=2)
print(ddf.compute())

Output:

     z    a
0  100  200
1  101  201
2  102  202
3  103  203

146

answered Sep 18 '22 08:09

HoosierDaddy

Related questions
                            
                                Using spyder with virtualenv
                            
                                A dictionary with a unique possible value for each key?
                            
                                Password protect a SPECIFIC Jupyter notebook
                            
                                How do I know if tensorflow using cuda and cudnn or not?
                            
                                Delay load python DLL when embedding python+numpy
                            
                                How to find memory leak with pandas
                            
                                printing text below tqdm progress bar
                            
                                How to force dicts to be unordered (for testing)?
                            
                                Python - Error (Relay access denied) while sending email
                            
                                Pycharm not displaying wide Dataframe in Jupyter Notebook
                            
                                "Allocating size to..." GTK Warning when using Gtk.TreeView inside Gtk.ScrolledWindow
                            
                                How can I get Vim to recognize Python3 syntax?
                            
                                ModuleNotFoundError: No module named 'tensorflow.python.training'
                            
                                No warning at undefined variables in PyCharm Community 2017.2
                            
                                Multi-output regression model always returns the same value for a batch in Tensorflow
                            
                                How to locate the four elements using selenium in python
                            
                                How to profile a scrapy python script?
                            
                                Tensorflow - How to use the GPU instead of a CPU for tf.Estimator() CNNs
                            
                                Function that turns sequence into list
                            
                                Fastest way to lowercase a numpy array of unicode strings in Cython

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

add a dask.array column to a dask.dataframe

Tags:

python

dataframe

dask

Daniel Mahler

People also ask

1 Answers

HoosierDaddy

Recent Activity

Donate For Us