Our Django project is getting huge. We have hundreds of apps and use a ton of 3rd party python packages, many of which need to have C compiled. Our deployments are taking a long time when we need to create a new virtual environment for major releases. With that said, I'm looking to speed things up, starting with Pip. Does anyone know of a fork of Pip that will install packages in parallel?
Steps I've taken so far:
I've looked for a project that does just this with little success. I did find this Github Gist: https://gist.github.com/1971720 but the results are almost exactly the same as our single threaded friend.
I then found the Pip project on Github and started looking through the network of forks to see if I could find any commits that mentioned doing what I'm trying to do. It's a mess in there. I will fork it and try to parallelize it myself if I have to, I just want to avoid spending time doing that.
I saw a talk at DjangoCon 2011 from ep.io explaining their deployment stuff and they mention parallelizing pip, shipping .so files instead of compiling C and mirroring Pypi, but they didn't touch on how they did it or what they used.
As this discussion is the first hit on google: pip should not be considered thread safe. Plus there is a bug in pip that might cause an exception when invoking pip from within a thread if you are not running pip.
You can add as many packages as you want to the pip install command. In cases like this, a requirements. txt file can come in handy. Later in this tutorial, you'll learn how to use a requirements.
How do I Install a Specific Version of a Python Package? To install a specific version of a Python package you can use pip: pip install YourPackage==YourVersion . For example, if you want to install an older version of Pandas you can do as follows: pip install pandas==1.1. 3 .
PIP is automatically installed with Python 2.7.9+ and Python 3.4+ and it comes with the virtualenv and pyvenv virtual environments.
This example uses xargs to parallelize the build process by approximately 4x. You can increase the parallelization factor with max-procs below (keep it approximately equal to your number of cores).
If you're trying to e.g. speed up an imaging process that you're doing over and over, it might be easier and definitely lower bandwidth consumption to just image directly on the result rather than do this each time, or build your image using pip -t or virtualenv.
Download and install packages in parallel, four at a time:
xargs --max-args=1 --max-procs=4 sudo pip install < requires.txt
Note: xargs has different parameter names on different Linux distributions. Check your distribution's man page for specifics.
Same thing inlined using a here-doc:
cat << EOF | xargs --max-args=1 --max-procs=4 sudo pip install awscli bottle paste boto wheel twine markdown python-slugify python-bcrypt arrow redis psutil requests requests-aws EOF
Warning: there is a remote possibility that the speed of this method might confuse package manifests (depending on your distribution) if multiple pip's try to install the same dependency at exactly the same time, but it's very unlikely if you're only doing 4 at a time. It could be fixed pretty easily by pip install --uninstall depname
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With