Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How does conda work internally?

I searched for a while now but couldn't find any satisfactory answer:

How does conda (http://conda.pydata.org) work internally? Any details are welcome...

Furthermore, as it is python agnostic and apparently work so well and fluently, why is it not used as a general purpose package manager like apt or yum?

What are the restrictions of using only conda as package manager? Would it work?

Or the other way round, why are e.g. apt and yum not able to provide the functionality conda provides? Is conda "better" than those package manager or just different?

Thanks for any hints!

like image 387
SebastianNeubauer Avatar asked Jan 03 '15 10:01

SebastianNeubauer


People also ask

How does a conda work?

Conda is an open source package and environment management system that runs on Windows, Mac OS and Linux. Conda can quickly install, run, and update packages and associated dependencies. Conda can create, save, load, and switch between project specific software environments on your local computer.

How does conda store packages?

Conda packages are downloaded from remote channels, which are URLs to directories containing conda packages. The conda command searches a default set of channels, and packages are automatically downloaded and updated from http://repo.continuum.io/pkgs/. You can modify what remote channels are automatically searched.

What is the difference between Anaconda and conda?

Conda is a package manager. It helps you take care of your different packages by handling installing, updating and removing them. Anaconda contains all of the most common packages (tools) a data scientist needs and can be considered the hardware store of data science tools.

Is conda virtual environment?

In other words, Conda makes it seamless to create virtual environments using different versions of Python. With Venv, we will need to use an additional tool to manage Python versions or install multiple Python versions before creating virtual environments.


1 Answers

I explain a lot of this in my SciPy 2014 talk. Let me give a little outline here.

First off, a conda package is really simple. It is just a tarball of the files that are to be installed, along with some metadata in an info directory. For example the conda package for python is a tarball of the files

info/
    files
    index.json
    ...
bin/
    python
    ...
lib/
    libpython.so
    python2.7/
        ...
    ...
...

You can see exactly what it looks like by looking at the extracted packages in the Anaconda pkgs directory. The full spec is at https://docs.conda.io/projects/conda-build/en/latest/source/package-spec.html.

When conda installs this, it extracts the tarball to the pkgs directory and hard links the files into the installation environment. Finally, some files that have some hard coded installation paths have this replaced (usually shebang lines).

That's basically it. There is some more stuff that happens in terms of dependency resolution, but once it knows what packages its going to install that's how it does it.

The process of building a package is a little more complicated. @mattexx's answer and the document it links to describes a bit of the canonical way of building a package using conda build.

To answer your other questions:

Furthermore, as it is python agnostic and apparently work so well and fluently, why is it not used as a general purpose package manager like apt or yum?

You certainly can. The only thing limiting this are the set of packages that have been built for conda. On Windows, this is a very nice option, as there aren't any system package managers like there are on Linux.

What are the restrictions of using only conda as package manager? Would it work?

It would work, assuming you have conda packages for everything you are interested in. The main restriction is that conda only wants to install things into the conda environment itself, so things that require specific installation locations on the system might not be well suited to conda (although it's still doable, if you set that location as your environment path). Or for instance, conda might not be a suitable replacement for "project level" package managers like bower.

Also, conda probably shouldn't be used to manage system level libraries (libraries that must be installed in the / prefix), like kernel extensions or the kernel itself, unless you were to build out a distribution that uses conda as a package manager explicitly.

The main thing I will say about these things is that conda packages are generally made to be relocatable, meaning the installation prefix of the package does not matter. This is why hard coded paths are changed as part of the install process, for instance. It also means that dynamic libraries built with conda build will have their RPATHs (on Linux) and install names (on OS X) changed automatically to use relative paths instead of absolute ones.

Or the other way round, why are e.g. apt and yum not able to provide the functionality conda provides? Is conda "better" than those package manager or just different?

In some ways it's better, and in some ways it's not. Your system package manager knows your system, and there are packages in there that are not going to be in conda (and some, like the kernel, that probably shouldn't be in conda).

The main advantage of conda is its notion of environments. Since packages are made to be relocatable, you can install the same package in multiple places, and effectively have completely independent installs of everything, basically for free.

Does it use some kind of containerization

No, the only "containerization" is having separate install directories and making packages relocatable.

or static linking of all the dependencies,

The dependency linking is completely up to the package itself. Some packages statically link their dependencies, some don't. The dynamically linked libraries have their load paths changed as I described above to be relocatable.

why is it so "cross platform"?

"Cross platform" in this case means "cross operating system". Although the same binary package can't work across OS X, Linux, and Windows, the point is that conda itself works identically on all three, so if you have the same packages built for all three platforms, you can manage them all the same way regardless of which one you are on.

like image 68
asmeurer Avatar answered Sep 28 '22 04:09

asmeurer