Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to use external libraries with virtualenv? [duplicate]

I'm trying to figure out how to use external libraries. I have a program that runs successfully on Spark, and I am now trying to import external libraries. I'm using virtualenv and every time I submit it, spark complains that it cannot find the file.

Here is one of many submit commands I have tried:

/path/to/spark-1.1.0-bin-hadoop2.4/bin/spark-submit ua_analysis.py --py-files `pwd`/venv/lib/python2.7/site-packages

I have tried adding the files individually with the --py-files flag, I've also tried the following subdirectories.

venv/lib
venv/python2.7
venv/lib/python2.7/site-packages/<package_name>

All of these produce the following error

ImportError: ('No module named <module>', <function subimport at 0x7f287255dc80>, (<module>,))

    org.apache.spark.api.python.PythonRDD$$anon$1.read(PythonRDD.scala:124)
    org.apache.spark.api.python.PythonRDD$$anon$1.<init>(PythonRDD.scala:154)
    org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:87)
....

I've also tried copying these files to the pyspark directory to no success.

like image 891
Peter Klipfel Avatar asked Sep 03 '25 01:09

Peter Klipfel


1 Answers

When you create virtual env, pass --system-site-packages option to virtualenv:

virtualenv --system-site-packages venv

If you forgot pass the option:

rm venv/lib/python2.7/no-global-site-packages.txt

By both ways, you can import system-site-packages in the virtual env.

like image 130
kev Avatar answered Sep 04 '25 14:09

kev