Is conda install a thread-safe operation?

Tags:

I would like to install packages into multiple conda environments. Doing this one after the other takes quite some time, so it would be nice if I could run all the conda install steps for each environment in parallel. Would this be possible or are there conflicts (relating to hard links and lock files, possibly) when trying to run conda in parallel?

863

asked Oct 02 '19 22:10

RedbackThomson

Video Answer

1 Answers

The short answer: No, it should not be run concurrently.

Most of how Conda handles transaction safety was established in version v4.3. The release notes in v4.3.0 regarding changes to locks explicitly comment on running multiple processes:

[U]sers are cautioned that undefined behavior can result when conda is running in multiple process and operating on the same package caches and/or environments.

It sounds like you're talking about different environments, so that shouldn't be an issue. However, you need to ensure that the package(s) to be installed is already downloaded into the package cache, otherwise it is not safe.

Partial Parallel Strategy

There is a --download-only flag, which will only add the package to the package cache (i.e., the part that cannot be done concurrently). But the issue is that this would still need to be done on a per-env basis, since different envs could have different constraints (e.g., different Python versions) that require different builds of the package.

I think the best you could do at the CLI is

Run conda install --download-only pkg sequentially on each env, then
Run conda install pkg in parallel for the envs.

This is, however, not in any official recommendation, and changes in how Conda does transactions could lead to this not being safe. I'll also say that I very much doubt this will save you much time; in fact, it might take longer. This approach will involve every env having to solve and prepare transactions twice, and that is usually the most computationally intensive step. The part you end up parallelizing involves disk transactions, which is going to be I/O bound, so I kind of doubt any time will be saved.

Some Evidence For This Being Safe

While this doesn't positively prove its safety, we can explicitly examine the transactions to make sure that when we run Step 2 above, it will only involve LINK transactions.

To test this, I made two envs:

conda create -n foo -y python=3.6
conda create -n bar -y python=3.6

Then I check the output from

conda install -n foo -d --json pandas

which shows a list of both FETCH and LINK transactions. The former involve manipulating the package cache, whereas the latter only the env. If I then run

conda install -n foo --download-only pandas

and check again,

conda install -n foo -d --json pandas

I now see only LINK transactions. Notably, the same is now true for -n bar, which should reinforce the fact that Step 1 should be done sequentially. The good part is that it won't lead to redownloading the same package; the bad part, that it involves a solve happening in every env. In a more heterogenous environment, we could expect there might be different FETCH operations in each env.

Finally, I can run the parallel final install

conda install -n foo -y pandas & conda install -n bar -y pandas &

which is safe if we can assume that that LINK transactions in different envs are safe.

188

answered Oct 10 '22 09:10

merv

Related questions
                            
                                How to use pandas tz_convert to convert to multiple different time zones
                            
                                measure distance between two elements using opencv
                            
                                What is "Application startup file" option on Cpanel python application?
                            
                                How to add % information on a treemap?
                            
                                How to avoid multiple `elif` statements?
                            
                                Python logger ignores FileHandler and StreamHandler levels in class
                            
                                Loading .npy files as dataset for pytorch
                            
                                Suppress warnings when using a python chunk inside an Rmd file
                            
                                How to set locale in Altair?
                            
                                How to copy the current row and the next row value in a new dataframe using python?
                            
                                returning multiple py::array without copying in pybind11
                            
                                A Bowyer-Watson Delaunay Triangulation I implemented doesn't remove the triangles that contain points of the super-triangle
                            
                                How to remove dataclass attributes
                            
                                Python: Use mouse to draw a rectangle around objects in any window? Also store start and end coordinates as variables relative to said window?
                            
                                matplotlib: How to create original backend
                            
                                Target transformation and feature selection in scikit-learn
                            
                                Tkinter how to change the color of treeview selected items
                            
                                OSError: SavedModel file does not exist at: /content\model\2016/{saved_model.pbtxt|saved_model.pb}
                            
                                Why doesn't pyplot.show() work? [duplicate]
                            
                                How to split string with limit by the end in Python

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Is conda install a thread-safe operation?

Tags:

python

conda