I want to use matplotlib.bblpath or shapely.geometry libraries in pyspark.
When I try to import any of them I get the below error:
>>> from shapely.geometry import polygon
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ImportError: No module named shapely.geometry
I know the module isn't present, but how can these packages be brought to my pyspark libraries?
Using Virtualenv Since Python 3.3, a subset of its features has been integrated into Python as a standard library under the venv module. PySpark users can use virtualenv to manage Python dependencies in their clusters by using venv-pack in a similar way as conda-pack.
In the Spark context try using:
SparkContext.addPyFile("module.py") # also .zip
, quoting from the docs:
Add a .py or .zip dependency for all tasks to be executed on this SparkContext in the future. The path passed can be either a local file, a file in HDFS (or other Hadoop-supported filesystems), or an HTTP, HTTPS or FTP URI.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With