I have similar import error on Spark executors as described here, just with psycopg2: ImportError: No module named numpy on spark workers Here it says "Although pandas is too complex to distribute as a *.py file, you can create an egg for it and its dependencies and send that to executors". So the question is "How to create egg file from package and it dependencies?" Or wheel, in case eggs are legacy. Is there any command for this in pip?

You can install <code>wheel</code> using <code>pip install wheel</code>. Then create a .whl using <code>python setup.py bdist_wheel</code>. You'll find it in the <code>dist</code> directory in root directory of the python package. You might also want to pass <code>--universal</code> if you want a single .whl file for both python 2 and python 3. More info on wheel.

How to get egg or wheel file of pip-installed python package?

2 Answers

You want to be making a wheel. They are newer, more robust than eggs, and are supported by both Python 2/3.

For something as popular as numpy, you don't need to bother making the wheel yourself. They package wheels in their distribution, so you can just download it. Many python libraries will have a wheel as part of their distribution. See here: https://pypi.python.org/pypi/numpy

If you're curious, see here how to make one in general: https://pip.pypa.io/en/stable/reference/pip_wheel/.

Alternatively, you could just install numpy on your target workers.

EDIT:

After your comments, I think it's pertinent to mention the pipdeptree utility. If you need to see by hand what the pip dependencies are, this utility will list them for you. Here's an example:

$ pipdeptree
3to2==1.1.1
anaconda-navigator==1.2.1
ansible==2.2.1.0
  - jinja2 [required: <2.9, installed: 2.8]
    - MarkupSafe [required: Any, installed: 0.23]
  - paramiko [required: Any, installed: 2.1.1]
    - cryptography [required: >=1.1, installed: 1.4]
      - cffi [required: >=1.4.1, installed: 1.6.0]
        - pycparser [required: Any, installed: 2.14]
      - enum34 [required: Any, installed: 1.1.6]
      - idna [required: >=2.0, installed: 2.1]
      - ipaddress [required: Any, installed: 1.0.16]
      - pyasn1 [required: >=0.1.8, installed: 0.1.9]
      - setuptools [required: >=11.3, installed: 23.0.0]
      - six [required: >=1.4.1, installed: 1.10.0]
    - pyasn1 [required: >=0.1.7, installed: 0.1.9]
  - pycrypto [required: >=2.6, installed: 2.6.1]
  - PyYAML [required: Any, installed: 3.11]
  - setuptools [required: Any, installed: 23.0.0

If you're using Pyspark and need to package your dependencies, pip can't do this for you automatically. Pyspark has its own dependency management that pip knows nothing about. The best you can do is list the dependencies and shove them over by hand, as far as I know.

Additionally, Pyspark isn't dependent on numpy or psycopg2, so pip can't possibly tell you that you'd need them if all you're telling pip is your version of Pyspark. That dependency has been introduced by you, so you're responsible for giving it to Pyspark.

As a side note, we use bootstrap scripts that install our dependencies (like numpy) before we boot our clusters. It seems to work well. That way you list the libs you need once in a script, and then you can forget about it.

HTH.

100

answered Sep 18 '22 12:09

Matt Messersmith

You can install wheel using pip install wheel.

Then create a .whl using python setup.py bdist_wheel. You'll find it in the dist directory in root directory of the python package. You might also want to pass --universal if you want a single .whl file for both python 2 and python 3.

More info on wheel.

answered Sep 19 '22 12:09

ritiek

Related questions
                            
                                Python - slice array at different position on every row
                            
                                Best practices python classes
                            
                                TypeError: Unrecognized keyword arguments: {'show_accuracy': True} #yelp challenge dataset
                            
                                Matplotlib arrow in loglog plot
                            
                                Pandas sorting MultiIndex after concatenate
                            
                                Why is it "pip2" instead of "pip" after installed python with brew?
                            
                                Find minimum non-negative integer, which not satisfies the condition
                            
                                Python : class definition with **kwargs
                            
                                Elegant resample for groups in Pandas
                            
                                Can I safely assign to `coef_` and other estimated parameters in scikit-learn?
                            
                                Is there an alternative to python's permutations for generator input?
                            
                                Python subprocess check_output decoding specials characters
                            
                                Seaborn: Remove fit from distplot
                            
                                formatting timedelta64 when using pandas.to_excel
                            
                                Elastic net regression or lasso regression with weighted samples (sklearn)
                            
                                How to convert a numpy array (which is actually a BGR image) to Base64 string?
                            
                                Returning cleaned_data when overwriting clean() method in Django Model forms
                            
                                Python SyntaxError: invalid syntax, are brackets allowed in function parameters in python3?
                            
                                Override dict() on class
                            
                                StringIO generated csv file that includes BOM

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to get egg or wheel file of pip-installed python package?

Tags:

python

pip

python-wheel

pyspark

Bunyk

People also ask

2 Answers

Matt Messersmith

ritiek

Recent Activity

Donate For Us