ValueError: Not all divisions are known, can't align partitions error on dask dataframe

Tags:

I have the following pandas dataframe with the following columns

user_id user_agent_id requests

All columns contain integers. I wan't to perform some operations on them and run them using dask dataframe. This is what I do.

Click to copy

user_profile = cache_records_dataframe[['user_id', 'user_agent_id', 'requests']] \
    .groupby(['user_id', 'user_agent_id']) \
    .size().to_frame(name='appearances') \
    .reset_index() # I am not sure I can run this on dask dataframe

user_profile_ddf = df.from_pandas(user_profile, npartitions=4)
user_profile_ddf['percent'] = user_profile_ddf.groupby('user_id')['appearances'] \
    .apply(lambda x: x / x.sum(), meta=float) #Percentage of appearance for each user group

But I get the following error

Click to copy

raise ValueError("Not all divisions are known, can't align "
ValueError: Not all divisions are known, can't align partitions. Please use `set_index` to set the index.

Am I doing something wrong? In pure pandas it works great but it gets slow for many lines (although they fit in memory) so I want to parallelize the computations.

909

asked Jul 11 '17 09:07

Apostolos

1 Answers

When creating the dask dataframe add the reset_index():

Click to copy

user_profile_ddf = df.from_pandas(user_profile, npartitions=4).reset_index()

134

answered Sep 20 '22 16:09

skibee

Related questions
                            
                                Python Multiprocessing Pool Doesn't Create Enough Processes
                            
                                Jupyter Notebook rpy2 Rmagics: How to set the default plot size?
                            
                                Python Eve - Query Embedded Data Relation
                            
                                How to install OpenAI Universe without getting error code 1 on Windows?
                            
                                How to save and restore partitioned variable in Tensorflow
                            
                                Retraining the last layer of Inception-ResNet-v2
                            
                                Linking c++-class for boost_python in cygwin
                            
                                Blitting for live update in Tkinter GUI - performance and image overlap issues
                            
                                Download an entire webpage?
                            
                                Ways to handle exceptions in Dask distributed
                            
                                Multivariate normality test in Python [closed]
                            
                                ValueError: array is too big
                            
                                Too strong regularization for an autoencoder (Keras autoencoder tutorial code)
                            
                                Python: Save data in google cloud datastore emulator
                            
                                Enforce custom ordering on Sympy print
                            
                                Why the dip in speed increase for generating 400,000,000 random numbers?
                            
                                matplotlib error when running plotting in multiprocess
                            
                                Remove properties from JSON object not present in schema?
                            
                                Prevent Flask from ever sending Set-Cookie?
                            
                                How do I run Flask+Nginx+uWSGI with SELinux in Enforcing mode?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

ValueError: Not all divisions are known, can't align partitions error on dask dataframe

Tags:

python

dataframe

dask

dask-distributed

Apostolos

People also ask

1 Answers

skibee

Recent Activity

Donate For Us