Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Parallel Pip install

Our Django project is getting huge. We have hundreds of apps and use a ton of 3rd party python packages, many of which need to have C compiled. Our deployments are taking a long time when we need to create a new virtual environment for major releases. With that said, I'm looking to speed things up, starting with Pip. Does anyone know of a fork of Pip that will install packages in parallel?

Steps I've taken so far:

  • I've looked for a project that does just this with little success. I did find this Github Gist: https://gist.github.com/1971720 but the results are almost exactly the same as our single threaded friend.

  • I then found the Pip project on Github and started looking through the network of forks to see if I could find any commits that mentioned doing what I'm trying to do. It's a mess in there. I will fork it and try to parallelize it myself if I have to, I just want to avoid spending time doing that.

  • I saw a talk at DjangoCon 2011 from ep.io explaining their deployment stuff and they mention parallelizing pip, shipping .so files instead of compiling C and mirroring Pypi, but they didn't touch on how they did it or what they used.

like image 756
Kyle Avatar asked Jun 13 '12 18:06

Kyle


People also ask

Is pip Cache thread safe?

As this discussion is the first hit on google: pip should not be considered thread safe. Plus there is a bug in pip that might cause an exception when invoking pip from within a thread if you are not running pip.

Can pip install multiple packages?

You can add as many packages as you want to the pip install command. In cases like this, a requirements. txt file can come in handy. Later in this tutorial, you'll learn how to use a requirements.

How do I install pip to another version?

How do I Install a Specific Version of a Python Package? To install a specific version of a Python package you can use pip: pip install YourPackage==YourVersion . For example, if you want to install an older version of Pandas you can do as follows: pip install pandas==1.1. 3 .

Does Python install pip?

PIP is automatically installed with Python 2.7.9+ and Python 3.4+ and it comes with the virtualenv and pyvenv virtual environments.


1 Answers

Parallel pip installation

This example uses xargs to parallelize the build process by approximately 4x. You can increase the parallelization factor with max-procs below (keep it approximately equal to your number of cores).

If you're trying to e.g. speed up an imaging process that you're doing over and over, it might be easier and definitely lower bandwidth consumption to just image directly on the result rather than do this each time, or build your image using pip -t or virtualenv.

Download and install packages in parallel, four at a time:

xargs --max-args=1 --max-procs=4 sudo pip install < requires.txt 

Note: xargs has different parameter names on different Linux distributions. Check your distribution's man page for specifics.

Same thing inlined using a here-doc:

 cat << EOF | xargs --max-args=1 --max-procs=4 sudo pip install  awscli  bottle  paste  boto                                                                           wheel  twine                                                                          markdown  python-slugify  python-bcrypt  arrow  redis  psutil  requests  requests-aws  EOF 

Warning: there is a remote possibility that the speed of this method might confuse package manifests (depending on your distribution) if multiple pip's try to install the same dependency at exactly the same time, but it's very unlikely if you're only doing 4 at a time. It could be fixed pretty easily by pip install --uninstall depname.

like image 78
fatal_error Avatar answered Sep 19 '22 04:09

fatal_error