Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why are packages installed rather than just linked to a specific environment?

Tags:

I've noticed that normally when packages are installed using various package managers (for python), they are installed in /home/user/anaconda3/envs/env_name/ on conda and in /home/user/anaconda3/envs/env_name/lib/python3.6/lib-packages/ using pip on conda.

But conda caches all the recently downloaded packages too.

So, my question is: Why doesn't conda install all the packages on a central location and then when installed in a specific environment create a link to the directory rather than installing it there?

I've noticed that environments grow quite big and that this method would probably be able to save a bit of space.

like image 707
lahsuk Avatar asked Apr 08 '19 04:04

lahsuk


People also ask

Should I install packages in base environment?

Avoid installing packages into your base Conda environment Conda has a default environment called base that include a Python installation and some core system libraries and dependencies of Conda. It is a “best practice” to avoid installing additional packages into your base software environment.

Where are packages installed in a conda environment?

Conda installs packages into the anaconda/pkgs directory. If conda cannot find the file, try using an absolute path name instead of a relative path name. Installing packages directly from the file does not resolve dependencies.

Does conda install packages for each environment?

To automatically add default packages to each new environment that you create: Open Anaconda Prompt or terminal and run: conda config --add create_default_packages PACKAGENAME1 PACKAGENAME2. Now, you can create new environments and the default packages will be installed in all of them.


1 Answers

Conda already does this. However, because it leverages hardlinks, it is easy to overestimate the space really being used, especially if one only looks at the size of a single env at a time.

To illustrate the case, let's use du to inspect the real disk usage. First, if I count each environment directory individually, I get the uncorrected per env usage

$ for d in envs/*; do du -sh $d; done 2.4G    envs/pymc36 1.7G    envs/pymc3_27 1.4G    envs/r-keras 1.7G    envs/stan 1.2G    envs/velocyto 

which is what it might look like from a GUI.

Instead, if I let du count them together (i.e., correcting for the hardlinks), we get

$ du -sh envs/* 2.4G    envs/pymc36 326M    envs/pymc3_27 820M    envs/r-keras 927M    envs/stan 548M    envs/velocyto 

One can see that a significant amount of space is already being saved here.

Most of the hardlinks go back to the pkgs directory, so if we include that as well:

$ du -sh pkgs envs/* 8.2G    pkgs 400M    envs/pymc36 116M    envs/pymc3_27  92M    envs/r-keras  62M    envs/stan 162M    envs/velocyto 

one can see that outside of the shared packages, the envs are fairly light. If you're concerned about the size of my pkgs, note that I have never run conda clean on this system, so my pkgs directory is full of tarballs and superseded packages, plus some infrastructure I keep in base (e.g., Jupyter, Git, etc).

like image 173
merv Avatar answered Oct 27 '22 19:10

merv