I've tested various ways to manage my project dependencies in Python so far:
My problem with all of these (except 1.) is that my harddrive space fills up pretty fast: I am not a developer, I use Python for my daily work. Therefore, I have hundreds of small projects that all do their thing. Unfortunately, for 80% of projects I need the "big" packages: numpy
, pandas
, scipy
, matplotlib
- you name it. A typical small project is about 1000 to 2000 lines of code, but has 800MB of package dependencies in venv/virtualenv/pipenv. Virtually I have about 100+ GB of my HDD filled with python virtual dependencies.
Moreover, installing all of these in each virtual environment takes time. I am working in Windows, many packages cannot be easily installed from pip in windows: Shapely
, Fiona
, GDAL
- I need the precompiled wheels from Christoph Gohlke. This is easy, but it breaks most workflows (e.g. pip install -r requirements.txt
or pipenv install
from pipfile). I feel like I am 40% installing/updating package dependencies and only 60% of my time writing code. Further, none of these package managers really help with publishing & testing code, so I need other tools e.g. setuptools
, tox
, semantic-release
, twine
...
I've talked to colleagues but they all face the same problem and no one seems to have a real solution. I was wondering if there is an approach to have some packages, e.g. the ones you use in most projects, installed globally - for example, numpy
, pandas
, scipy
, matplotlib
would be installed with pip in C:\Python36\Lib\site-packages
or with conda
in C:\ProgramData\Miniconda3\Lib\site-packages
- these are well developed packages that don't often break things. And if, I would like to fix that anyway soon in my projects.
Other things would go in local virtualenv-folders - I am tempted to move my current workflow from pipenv
to conda
.
Does such an approach make sense at all? At least there has been a lot of development lately in python, perhaps something emerged that I didn't see yet. Is there any best-practice guidance on how to setup files in such a mixed global-local environment, e.g. how to maintain setup.py
, requirements.txt
or pyproject.toml
for sharing development projects through Gitlab, Github etc.? What are the pitfalls/caveats?
There's also this great blog post from Chris Warrick that explains it pretty much fully.
[Update 2021]
Since this post still gets many views, here is a subjective 2021 update:
pyproject.toml
seem to be the common agreed upon denominator[Update 2020]
After half a year, I can say that working with Conda (Miniconda) has solved most of my problems:
conda env create -f myenv.yml
is the same on every platformpip
in conda environment and add packages from pypi with pip. Hint: conda update --all -n myenv -c conda-forge
will only update packages from conda, not those installed with pip
. Pip installed dependencies must be updated manually with pip install pack_name --upgrade
. Note that installing packages with pip in conda is an emergency solution that should typically be avoided environment.yml
, specifying the conda channel priority, the packages from conda and the packages from pipMiniconda3 Docker
- this makes test-runs very simple and straight forwardyml
s can be defined strict or open, depending on the situation. E.g. you can fix the env to Python 3.6, but have it retrieve any security updates in this version-range (e.g. 3.6.9)jupyter_env
, where jupyter lab and most of my scientific packages are installed (numpy, geos, pandas scipy etc.) - I activate it whenever I need access to these tools, I can keep those up to date in a single place. For development of specific packages, I have extra environments that are only used for the package-dependencies (e.g. packe1_env
). I have about 10 environemnts overall, which is manageable. Some general purpose tools are installed in the base conda environment, e.g. pylint
. Be warned: to make pylint/pycodestyle/autopep8 etc. work (e.g.) in VS Code, it must be installed to the same env that contains the python-code-dependencies - otherwise, you'll get unresolved import warningsconda update -n base conda
, and my envs with conda update --all -n myenv -c conda-forge
once a week, works like a charm!--stack
flag available (as of 2019-02-07) that allows stacking conda environments, e.g. conda activate my_big_env
then conda activate --stack dev_tools_env
allows making some general purpose packages available in many envs. However, use with caution - I found that code linters, such as pylint, must be in the same env as the dependencies of the code that is lintedconda
from Windows Subsystem for Linux
(WSL), this improved again my workflow significantly: packages are installed faster, I can work with VS Code Insiders in Windows directly connected to WSL and there're far less bugs with python packages in the Linux environment.conda config --env --set channel_priority strict
- this will only install versions that are compatible. With very few and rare package combinations, this may result in unsolvable dependency conflicts (i.e. the env cannot be created). In this case, I usually create smaller envs for experimental development, with less packages and channel_priority
set to flexible
(the default). Sometimes, package subsets exists that are easier to solve such as geoviews-core
(instead of geoviews
) or matplotlib-base
(instead of matplotlib
). It's also a good approach to try lower python versions for those experimental envs that are unsolvable with strict
, e.g. conda create -n jupyter_exp_env python=3.6 -c conda-forge
. A last-resort hack is installing packages with pip, which avoids conda's package resolver (but may result in unstable environments and other issues, you've been warned!). Make sure to explicitly install pip
in your env first.One overall drawback is that conda gets kind of slow when using the large conda-forge channel. They're working on it, but at the same time conda-forge index is growing really fast.
These long chains of dependencies can be solved by having a package manager that resolves all dependencies automatically. Other than being a hassle (to resolve all the dependencies manually), manual resolution can mask dependency cycles or conflicts.
Unfortunately, pip makes no attempt to resolve dependency conflicts. For example, if you install two packages, package A may require a different version of a dependency than package B requires. Pip can install from either Source Distributions (sdist) or Wheel (. whl) files.
Pip relies on package authors to stipulate the dependencies for their code in order to successfully download and install the package plus all required dependencies from the Python Package Index (PyPI). But if packages are installed one at a time, it may lead to dependency conflicts.
Inside env/ there will be a directory called lib which will contain Python and will store your dependencies. Then any time you return to the project, run source env/bin/activate again so that the dependencies can be found.
I was wondering if there is an approach to have some packages, e.g. the ones you use in most projects, installed globally ... Other things would go in local virtualenv-folders
Yes, virtualenv supports this. Install the globally-needed packages globally, and then, whenever you create a virtualenv, supply the --system-site-packages
option so that the resulting virtualenv will still be able to use globally-installed packages. When using tox, you can set this option in the created virtualenvs by including sitepackages=true
in the appropriate [testenv]
section(s).
Problem
You have listed a number of issues that no one approach may be able to completely resolve:
'I need the "big" packages: numpy, pandas, scipy, matplotlib... Virtually I have about 100+ GB of my HDD filled with python virtual dependencies'
... installing all of these in each virtual environment takes time
... none of these package managers really help with publishing & testing code ...
I am tempted to move my current workflow from pipenv to conda.
Thankfully, what you have described is not quite the classic dependency problem that plagues package managers - circular dependencies, pinning dependencies, versioning, etc.
Details
I have used conda on Windows many years now under similar restrictions with reasonable success. Conda was originally designed to make installing scipy-related packages easier. It still does.
If you are using the "scipy stack" (scipy, numpy, pandas, ...), conda is your most reliable choice.
Conda can:
Conda can't:
Reproducible Envs
The following steps should help reproduce virtualenvs if needed:
Avoid pip-issues
I was wondering if there is an approach to have some packages, e.g. the ones you use in most projects, installed globally ... Other things would go in local virtualenv-folders
Non-conda tools
conda
However, if you want to stay with conda, you can try the following:
A. Make a working environment separate from your base environment, e.g. workenv
. Consider this your goto, "global" env to do a bulk of your daily work.
> conda create -n workenv python=3.7 numpy pandas matplotblib scipy > activate workenv (workenv)>
B. Test installations of uncommon pip packages (or weighty conda packages) within a clone of the working env
> conda create --name testenv --clone workenv > activate testenv (testenv)> pip install pint
Alternatively, make new environments with minimal packages using a requirements.txt
file
C. Make a backup of dependencies into a requirements.txt
-like file called environment.yml
per virtualenv. Optionally make a script to run this command per environment. See docs on sharing/creating environment files. Create environments in the future from this file:
> conda create --name testenv --file environment.yml > activate testenv (testenv)> conda list
Publishing
The packaging problem is an ongoing, separate issue that has gained traction with the advent of pyproject.toml
file via PEP 518 (see related blog post by author B. Cannon). Packaging tools such as flit
or poetry
have adopted this modern convention to make distributions and publish them to a server or packaging index (PyPI). The pyproject.toml
concept tries to move away from traditional setup.py
files with specific dependence to setuptools
.
Dependencies
Tools like pipenv
and poetry
have a unique modern approach to addressing the dependency problem via a "lock" file. This file allows you to track and reproduce the state of your dependency graphs, something novel in the Python packaging world so far (see more on Pipfile vs. setup.py here). Moreover, there are claims that you can still use these tools in conjunction with conda, although I have not tested the extent of these claims. The lock file isn't standardized yet, but according to core developer B. Canon in an interview on The future of Python packaging, (~33m) "I'd like to get us there." (See Updates).
Summary
If you are working with any package from the scipy stack, use conda (Recommended):
pipenv
: use to deploy and make Pipfile.lock
poetry
: use to deploy and make poetry.lock
pipenv
: develop via pipenv install -e.
and manually publish with twine flit
: automatically package and *publishpoetry
: automatically package and publish See Also
pyproject.toml
, lock files and tools.pipenv
vs. pip
, 37m) and dev environment.Updates:
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With