Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to get egg or wheel file of pip-installed python package?

I have similar import error on Spark executors as described here, just with psycopg2: ImportError: No module named numpy on spark workers

Here it says "Although pandas is too complex to distribute as a *.py file, you can create an egg for it and its dependencies and send that to executors".

So the question is "How to create egg file from package and it dependencies?" Or wheel, in case eggs are legacy. Is there any command for this in pip?

like image 792
Bunyk Avatar asked Oct 10 '17 11:10

Bunyk


People also ask

How do I list installed pip packages?

If you want to list all the Python packages installed in an environment, pip list command is what you are looking for. The command will return all the packages installed, along with their specific version and location. If a package is installed from a remote host (for example PyPI or Nexus) the location will be empty.


2 Answers

You want to be making a wheel. They are newer, more robust than eggs, and are supported by both Python 2/3.

For something as popular as numpy, you don't need to bother making the wheel yourself. They package wheels in their distribution, so you can just download it. Many python libraries will have a wheel as part of their distribution. See here: https://pypi.python.org/pypi/numpy

If you're curious, see here how to make one in general: https://pip.pypa.io/en/stable/reference/pip_wheel/.

Alternatively, you could just install numpy on your target workers.

EDIT:

After your comments, I think it's pertinent to mention the pipdeptree utility. If you need to see by hand what the pip dependencies are, this utility will list them for you. Here's an example:

$ pipdeptree
3to2==1.1.1
anaconda-navigator==1.2.1
ansible==2.2.1.0
  - jinja2 [required: <2.9, installed: 2.8]
    - MarkupSafe [required: Any, installed: 0.23]
  - paramiko [required: Any, installed: 2.1.1]
    - cryptography [required: >=1.1, installed: 1.4]
      - cffi [required: >=1.4.1, installed: 1.6.0]
        - pycparser [required: Any, installed: 2.14]
      - enum34 [required: Any, installed: 1.1.6]
      - idna [required: >=2.0, installed: 2.1]
      - ipaddress [required: Any, installed: 1.0.16]
      - pyasn1 [required: >=0.1.8, installed: 0.1.9]
      - setuptools [required: >=11.3, installed: 23.0.0]
      - six [required: >=1.4.1, installed: 1.10.0]
    - pyasn1 [required: >=0.1.7, installed: 0.1.9]
  - pycrypto [required: >=2.6, installed: 2.6.1]
  - PyYAML [required: Any, installed: 3.11]
  - setuptools [required: Any, installed: 23.0.0

If you're using Pyspark and need to package your dependencies, pip can't do this for you automatically. Pyspark has its own dependency management that pip knows nothing about. The best you can do is list the dependencies and shove them over by hand, as far as I know.

Additionally, Pyspark isn't dependent on numpy or psycopg2, so pip can't possibly tell you that you'd need them if all you're telling pip is your version of Pyspark. That dependency has been introduced by you, so you're responsible for giving it to Pyspark.

As a side note, we use bootstrap scripts that install our dependencies (like numpy) before we boot our clusters. It seems to work well. That way you list the libs you need once in a script, and then you can forget about it.

HTH.

like image 100
Matt Messersmith Avatar answered Sep 18 '22 12:09

Matt Messersmith


You can install wheel using pip install wheel.

Then create a .whl using python setup.py bdist_wheel. You'll find it in the dist directory in root directory of the python package. You might also want to pass --universal if you want a single .whl file for both python 2 and python 3.

More info on wheel.

like image 43
ritiek Avatar answered Sep 19 '22 12:09

ritiek