I would like to install packages into multiple conda environments. Doing this one after the other takes quite some time, so it would be nice if I could run all the conda install
steps for each environment in parallel. Would this be possible or are there conflicts (relating to hard links and lock files, possibly) when trying to run conda in parallel?
Anaconda’s default channel alone has around 635 packages. It is better to install only the packages you require for your application. To do so, go to Anaconda prompt and type conda install command comes in with a range of options. You can refer to them using conda install —help.
Conda quickly installs, runs and updates packages and their dependencies. Conda easily creates, saves, loads and switches between environments on your local computer. It was created for Python programs, but it can package and distribute software for any language.
conda update will not make the environment inconsistent or, through inaction, allow an environment to become inconsistent. conda update will install the packages explicitly requested by the user on the command line, except when it conflicts with the First Law.
The default repository that conda uses when you run conda install is the Anaconda Distribution Repository. It has about 600 Python packages and a similar number of R packages. pip is similar to conda in many respects, but it’s not closely connected to a particular distribu Is NetworkX in Anaconda?
The short answer: No, it should not be run concurrently.
Most of how Conda handles transaction safety was established in version v4.3. The release notes in v4.3.0 regarding changes to locks explicitly comment on running multiple processes:
[U]sers are cautioned that undefined behavior can result when conda is running in multiple process and operating on the same package caches and/or environments.
It sounds like you're talking about different environments, so that shouldn't be an issue. However, you need to ensure that the package(s) to be installed is already downloaded into the package cache, otherwise it is not safe.
There is a --download-only
flag, which will only add the package to the package cache (i.e., the part that cannot be done concurrently). But the issue is that this would still need to be done on a per-env basis, since different envs could have different constraints (e.g., different Python versions) that require different builds of the package.
I think the best you could do at the CLI is
conda install --download-only pkg
sequentially on each env, thenconda install pkg
in parallel for the envs.This is, however, not in any official recommendation, and changes in how Conda does transactions could lead to this not being safe. I'll also say that I very much doubt this will save you much time; in fact, it might take longer. This approach will involve every env having to solve and prepare transactions twice, and that is usually the most computationally intensive step. The part you end up parallelizing involves disk transactions, which is going to be I/O bound, so I kind of doubt any time will be saved.
While this doesn't positively prove its safety, we can explicitly examine the transactions to make sure that when we run Step 2 above, it will only involve LINK transactions.
To test this, I made two envs:
conda create -n foo -y python=3.6
conda create -n bar -y python=3.6
Then I check the output from
conda install -n foo -d --json pandas
which shows a list of both FETCH and LINK transactions. The former involve manipulating the package cache, whereas the latter only the env. If I then run
conda install -n foo --download-only pandas
and check again,
conda install -n foo -d --json pandas
I now see only LINK transactions. Notably, the same is now true for -n bar
, which should reinforce the fact that Step 1 should be done sequentially. The good part is that it won't lead to redownloading the same package; the bad part, that it involves a solve happening in every env. In a more heterogenous environment, we could expect there might be different FETCH operations in each env.
Finally, I can run the parallel final install
conda install -n foo -y pandas & conda install -n bar -y pandas &
which is safe if we can assume that that LINK transactions in different envs are safe.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With