Anaconda has become very popular in scientific computing, because it bundles together over 125 of the most widely used Python data analysis libraries. My question is, since we already have pip (which is a very widely used Python package manager), why do we need Anaconda? Couldn't we all simply type pip install
for each of the 125+ libraries and they'd all work together nicely? Or would they not work together nicely, meaning that Anaconda has done us all a big favour by sorting out the issues that arise when trying to get 125+ libraries to interact nicely?
Three fundamental reasons:
- Most of these libraries require linking to system installed libraries (like, say, HDF5 for PyTables or ATLAS for Numpy), that the user may or may not be aware of. Note that Matplotlib requires a bunch of different graphical libraries, and if they are missing, it will crash on certain backends.
- pip compiles libraries (with wheels you can avoid this step, though). This requires a C compiler (difficult in Windows) and a FORTRAN compiler (difficult in Mac and Windows). It also takes time for big libraries like Scipy.
- Anaconda's metapackage anaconda is a minimum set of libraries that Continuum has made sure they play along together well. In an ideal world, we should always be using the last and most improved version of everything, but that may lead to incompatibilities.
And a complement:
- It is easy to use conda to create set of packages for distribution. So you can easily share your package including all its dependencies.