We are running a spark-submit command on a python script that uses Spark to parallelize object detection in Python using Caffe. The script itself runs perfectly fine if run in a Python-only script, but it returns an import error when using it with Spark code. I know the spark code is not the problem because it works perfectly fine on my home machine, but it is not functioning well on AWS. I am not sure if this somehow has to do with the environment variables, it is as if it doesn't detect them.
These environment variables are set:
SPARK_HOME=/opt/spark/spark-2.0.0-bin-hadoop2.7
PATH=$SPARK_HOME/bin:$PATH
PYTHONPATH=$SPARK_HOME/python/:$PYTHONPATH
PYTHONPATH=/opt/caffe/python:${PYTHONPATH}
Error:
16/10/03 01:36:21 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, 172.31.50.167): org.apache.spark.api.python.PythonException: Traceback (most recent call last):
File "/opt/spark/spark-2.0.0-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/worker.py", line 161, in main
func, profiler, deserializer, serializer = read_command(pickleSer, infile)
File "/opt/spark/spark-2.0.0-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/worker.py", line 54, in read_command
command = serializer._read_with_length(file)
File "/opt/spark/spark-2.0.0-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/serializers.py", line 164, in _read_with_length
return self.loads(obj)
File "/opt/spark/spark-2.0.0-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/serializers.py", line 422, in loads
return pickle.loads(obj)
File "/opt/spark/spark-2.0.0-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/cloudpickle.py", line 664, in subimport
__import__(name)
ImportError: ('No module named caffe', <function subimport at 0x7efc34a68b90>, ('caffe',))
Does anyone know why this would be an issue?
This package from Yahoo manages what we're trying to do by shipping Caffe as a jar dependency and then uses it again in Python. But I haven't found any resources on how to build it and import it ourselves.
https://github.com/yahoo/CaffeOnSpark
You probably haven’t compiled the caffe python wrappers in your AWS environment. For reasons that completely escape me (and several others, https://github.com/BVLC/caffe/issues/2440) pycaffe is not available as a pypi package, and you have to compile it yourself. You should follow the compilation/make instructions here or automate it using ebextensions if you are in an AWS EB environment: http://caffe.berkeleyvision.org/installation.html#python
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With