Default pip installation of Dask gives "ImportError: No module named toolz"

Tags:

I installed Dask using pip like this:

pip install dask

and when I try to do import dask.dataframe as dd I get the following error message:

>>> import dask.dataframe as dd
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/path/to/venv/lib/python2.7/site-packages/dask/__init__.py", line 5, in <module>
    from .async import get_sync as get
  File "/path/to/venv/lib/python2.7/site-packages/dask/async.py", line 120, in <module>
    from toolz import identity
ImportError: No module named toolz
No module named toolz

I noticed that the documentation states

pip install dask: Install only dask, which depends only on the standard library. This is appropriate if you only want the task schedulers.

so I'm confused as to why this didn't work.

334

asked Jan 03 '17 22:01

TheDudeAbides

2 Answers

In order to use Dask's parallelized dataframes (built on top of pandas), you have to tell pip to install some "extras" (reference), as mentioned in the Dask installation documentation:

pip install "dask[dataframe]"

Or you could just do

pip install "dask[complete]"

to get the whole bag of tricks. NB: The double-quotes may or may not be required in your shell.

The justification for this is (or was) mentioned in the Dask documentation:

We do this so that users of the lightweight core dask scheduler aren’t required to download the more exotic dependencies of the collections (numpy, pandas, etc.)

As mentioned in Obinna's answer, you may wish to do this inside a virtualenv, or use pip install --user to put the libraries in your home directory, if, say, you don't have admin privileges on to the host OS.

Extra details

At Dask 0.13.0 and below, there was a requirement on toolz' identity function within dask/async.py. There is ~~an open~~ a closed pull request associated with GitHub issue #1849 to remove this dependency. ~~In the meantime~~ If, for some reason, you are stuck with an older version of dask, you can work around that particular issue by simply doing pip install toolz.

But this wouldn't (completely) fix your problem with import dask.dataframe as dd anyway. Because you'd still get this error:

>>> import dask.dataframe as dd
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/data/staff_agbio/PhyloWeb/data/dask-test/venv/local/lib/python2.7/site-packages/dask/dataframe/__init__.py", line 3, in <module>
    from .core import (DataFrame, Series, Index, _Frame, map_partitions,
  File "/data/staff_agbio/PhyloWeb/data/dask-test/venv/local/lib/python2.7/site-packages/dask/dataframe/core.py", line 12, in <module>
    import pandas as pd
ImportError: No module named pandas

or if you had pandas installed already, you'd get ImportError: No module named cloudpickle. So, pip install "dask[dataframe]" seems to be the way to go if you're in this situation.

121

answered Oct 11 '22 12:10

7 revs

I had this same issue and this was what fixed it for me.

Create a virtual env for your project
Cd your project directory (not required if you're good with directory navigation)
Activate you virtual env
pip install "dask[complete]" : This will install everything. You may wish to install only a given component like dataframe, then use pip install "dask[dataframe]"

The bottomline was that I had to be in my virtual environment; this would install dask for this env only.

answered Oct 11 '22 13:10

Obinna Nnenanya

Related questions
                            
                                Python Child cannot use a Module the Parent Imported
                            
                                NumPy k-th diagonal indices
                            
                                Replace a string located between
                            
                                "object of type 'NoneType' has no len()" error
                            
                                Multiprocessing in Python while limiting the number of running processes
                            
                                ImportError: cannot import name "urandom" [closed]
                            
                                How do I delete the Nth list item from a list of lists (column delete)?
                            
                                NLTK Tagging spanish words using a corpus
                            
                                Posting html form values to python script
                            
                                JSON schema validation with arbitrary keys
                            
                                pyodbc and python 3.4 on Windows
                            
                                Filtering pandas data frame by a list of id's
                            
                                GitPython tags sorted
                            
                                import check_arrays from sklearn
                            
                                How to find the first index of any of a set of characters in a string
                            
                                How to use login_required in django rest view
                            
                                Conditional assignment of tensor values in TensorFlow
                            
                                ValueError: Unable to configure handler 'file': [Errno 2] No such file or directory:
                            
                                How to insert scale bar in a map in matplotlib
                            
                                python - Error R10 (Boot timeout) -> Web process failed to bind to $PORT within 60 seconds of launch

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Default pip installation of Dask gives "ImportError: No module named toolz"

Tags:

python

installation

pip

importerror

dask