With Debian or Ubuntu packages, there is some quality control. Is PIP similar, or is it a complete free-for-all? Can anyone upload any code they want under any name that they want?
There seem to be some junk packages like https://pypi.python.org/pypi/opencv/0.0.1 which has the same name as a very popular computer vision framework, for example.
To summarize, yes, a few instances of malware being present in the library have been detected. More to the point, there is no protection against malware other than the user's own diligence.
It is good to be aware of what packages you are installing, especially since pip and other package managers have a history of actors using typosquatting to distribute malware. If you want to avoid this issue in the future when working on your own packages, consider using pip install .
By default, pip does not perform any checks to protect against remote tampering and involves running arbitrary code from distributions. It is, however, possible to use pip in a manner that changes these behaviours, to provide a more secure installation mechanism.
You'll have to audit the package (or get someone else to do that) to know if it's secure. No easy way around it. All pypi packages have md5 signature attached (link in parentheses after the file).
No, there are no third-party checks on the code that is uploaded to PyPI (the Python Package Index, which is where pip downloads packages unless explicitly instructed otherwise). The only restriction is that once a package name exists, only the maintainer(s) can upload packages with that name (i.e. you can't submit a malicious upgrade to someone else's package using the same name). It is up to the maintainer to ensure that whatever they make available on PyPI doesn't contain malware, unless they intend for it to be malware, and it is up to each individual developer to be aware of what they are downloading using pip.
This has been exploited in a research project investigating "typosquatting". The researcher uploaded some "simulation malware" (mostly harmless) to PyPI under names that were misspelled versions of popular package names, in order to collect data on how often these misspelled packages were installed. If a black-hat hacker had done the same thing, they could have used much more malicious code.
See also this Security Stack Exchange question on the same topic.
To add to the existing answer, 5 years later:
A piece of software that was downloaded 30,000 times from PyPI was in fact malware: It stole credit card numbers and login credentials and injected malicious code on infected machines.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With