Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can conda environment inherit base packages?

Tags:

python

conda

I'm looking for a solution where environments do inherit from root, but searching for the answer there seems to be a lot of confusion. Many OP questions believe they are inheriting packages when they are not. So, the search results find these questions, but the answer has the counter solution (or just explain they are mistaken).

That said, one OP actually has a similar objective. Can packages be shared across Anaconda environments? This OP says they are running out of space on their HDD. The implication being "sharing" should use the same installed packages in the new environment. The answer (not accepted) is to use --clone.

I also found this post, Do newly created conda envs inherit all packages from the base env? which says --clone does not share packages. In this post the OP believed their new environment "shared" packages, and then concludes "shared" packages don't exist. What is the use of non-separated anaconda environments?

I tested both the --clone flag, and the Conda Docs instructions to "build identical environments" options. Both env directories have the same identical size: 2G+.

(base) $ conda list --explicit > spec-file.txt
# Produced Size On Disk: 2.14 GB (2,305,961,984 bytes)

(base) conda create --name myclone --clone root
# Produced Size On Disk, clone: 2.14 GB (2,304,331,776 bytes)

The only difference was building identical environment downloaded the packages again, and clone coppied the local file taking much less time.

I use Miniconda to deploy CLI tools to coworker workstations. Basically, the tools all use the same packages, with the occasional exception, when I need to add a particular module which I don't want in the base install.

The goal is to use conda create for environments that extend the base packages similar to virtualenv --system-site-packages, and not to duplicate their installation.


UPDATE 2020-02-08

Responding to @merv and his link to this post (Why are packages installed rather than just linked to a specific environment?) which says Conda venvs inherit base packages by default. I had another opportunity this weekend with the problem. Here is the base case:

Downloaded the Miniconda installer. Installed with settings

  • Install for me
  • Install location: (C:\Users\xtian\Miniconda3_64) NOTE: I added the _64
  • Advanced Options
    • Add Anaconda to the system PATH variable, False
    • Register Anaconda as the system Python 3.7, True

I updated pip and setuptools,

conda update pip setuptools

Below, I list packages in base:

(base) C:\Users\xtian>conda list
# packages in environment at C:\Users\xtian\Miniconda3_64:
#
# Name                    Version                   Build  Channel
asn1crypto                1.3.0                    py37_0
ca-certificates           2020.1.1                      0
certifi                   2019.11.28               py37_0
cffi                      1.13.2           py37h7a1dbc1_0
chardet                   3.0.4                 py37_1003
conda                     4.8.2                    py37_0
conda-package-handling    1.6.0            py37h62dcd97_0
console_shortcut          0.1.1                         3
cryptography              2.8              py37h7a1dbc1_0
idna                      2.8                      py37_0
menuinst                  1.4.16           py37he774522_0
openssl                   1.1.1d               he774522_3
pip                       20.0.2                   py37_1
powershell_shortcut       0.0.1                         2
pycosat                   0.6.3            py37he774522_0
pycparser                 2.19                     py37_0
pyopenssl                 19.1.0                   py37_0
pysocks                   1.7.1                    py37_0
python                    3.7.4                h5263a28_0
pywin32                   227              py37he774522_1
requests                  2.22.0                   py37_1
ruamel_yaml               0.15.87          py37he774522_0
setuptools                45.1.0                   py37_0
six                       1.14.0                   py37_0
sqlite                    3.31.1               he774522_0
tqdm                      4.42.0                     py_0
urllib3                   1.25.8                   py37_0
vc                        14.1                 h0510ff6_4
vs2015_runtime            14.16.27012          hf0eaf9b_1
wheel                     0.34.2                   py37_0
win_inet_pton             1.1.0                    py37_0
wincertstore              0.2                      py37_0
yaml                      0.1.7                hc54c509_2

Then I successfully create new venv:

(base) C:\Users\xtian>conda create -n wsgiserver
Collecting package metadata (current_repodata.json): done
Solving environment: done

## Package Plan ##

  environment location: C:\Users\xtian\Miniconda3_64\envs\wsgiserver

Proceed ([y]/n)? y

Preparing transaction: done
Verifying transaction: done
Executing transaction: done

Here I activate the new wsgiserver virtual environment, list packages, and finally test with pip--but there is no pip! I tested today with 64 and 32 bit installers:

(base) C:\Users\xtian>conda activate wsgiserver

(wsgiserver) C:\Users\xtian>conda list
# packages in environment at C:\Users\xtian\Miniconda3_64\envs\wsgiserver:
#
# Name                    Version                   Build  Channel

(wsgiserver) C:\Users\xtian>pip
'pip' is not recognized as an internal or external command,
operable program or batch file.
like image 982
xtian Avatar asked Mar 18 '19 13:03

xtian


People also ask

Can conda environments share packages?

With conda, you can create, export, list, remove, and update environments that have different versions of Python and/or packages installed in them. Switching or moving between environments is called activating the environment. You can also share an environment file.

Is conda base a virtual environment?

In this article, I will show you how to manage your Python virtual environment by using Conda. Conda is a package and environment management system, it allows us to create and switch environments easily on our local machine.

Where does conda pull packages from?

By default, Anaconda/Miniconda stores packages in ~/anaconda/pkgs/ (or ~/opt/pkgs/ on macOS Catalina). Each package has an index.


1 Answers

Should Conda environments inherit base packages?

No. The recommended workflow is to use conda create --clone to create a new standalone environment, and then mutate that environment to add additional packages. Alternatively, one can dump the template environment to a YAML (conda env export > env.yaml), edit it to include or remove packages, and then create a new environment from that (conda env create -f env.yaml -n foo).

Concern about this wasting storage is unfounded in most situations.1 There can be a mirage of new environments taking up more space than they really do, due to Conda's use of hardlinks to minimize redundancy. A more detailed analysis of this can be found in the question, Why are packages installed rather than just linked to a specific environment?.

Can Conda environments inherit base packages?

It's not supported, but it's possible. First, let's explicitly state that nested activation of Conda environments via the conda activate --stack command does not enable or help to allow inheritance of Python packages across environments. This is because it does not manipulate PYTHONPATH, but instead only keeps the previous active environment on PATH and skips the deactivate scripts. A more detailed discussion of this is available in this GitHub Issue.

Now that we've avoided that red herring, let's talk about PYTHONPATH. One can use this environment variable to include additional site-packages directories to search. So, naively, something like

conda activate foo
PYTHONPATH=$CONDA_ROOT/lib/python3.7/site-packages python

should launch Python with the packages of both base and foo available to it. A key constraint for this to work is that the Python in the new environment must match that of base up to and including the minor version (in this case 3.7.*).

Thinking through the details

While this will achieve package inheritance, we need to consider: Will this actually conserve space? I'd argue that in practice it likely won't, and here's why.

Presumably, we don't want to physically duplicate the Python installation, but the new environment must have a Python installed in order to help constrain solving for the new packages we want. To do this, we should not only match the Python version (conda create -n foo python=3.7), but rather the exact same build as base:

# first check base's python
conda list -n base '^python$'
# EXAMPLE RESULT
# Name                    Version                   Build  Channel
python                    3.7.6                h359304d_2 

# use this when creating the environment
conda create -n foo python=3.7.6=h359304d_2

This will let Conda do its linking thing and use the same physical copy in both environments. However, there is no guarantee that Python's dependencies will also reuse the packages in base. In fact, if any compatible newer versions are available, it will download and install those.

Furthermore, let's say that we now install scikit-learn:

conda install -n foo scikit-learn

This again is going to check for the newest versions of it and its dependencies, independent of whether older but compatible versions of those dependencies are already available through base. So, more packages are unnecessarily being installed into the package cache.

The pattern here seems to be that we really want to find a way to have the foo env install new packages, but use as many of the existing packages to satisfy dependencies. And that is exactly what conda create --clone already does.2

Hence, I lose the motivation to bother with inheritance altogether.


Note

I'd speculate that for the special case of pure Python packages it may be plausible to use pip install --target from the base environment to install packages compatible with base to a location outside of base. The user could then add this directory to PYTHONPATH before launching python from base.

This would not be my first choice. I know the clone strategy is manageable; I wouldn't know what to expect with this going long-term.


[1] This will hold as long as the locations of the package cache (pkgs_dirs) and where the environment is created (which defaults to envs_dirs) are on the same volume. Configurations with multiple volumes should be using softlinks, which will ultimately have the same effect. Unless one has manually disabled linking of both types, Conda will do a decent job at silently minimizing redundancy.

[2] Technically, one might also have a stab at using the --offline flag to force Conda to use what it already has cached. However, the premise of OP is that the additional package is new, so it may not be wise to assume we already have a compatible version in the cache.

like image 180
merv Avatar answered Oct 06 '22 19:10

merv