I am running a boo.py
script on AWS EMR using spark-submit
(Spark 2.0).
The file finished successfully when I use
python boo.py
However, it failed when I run
spark-submit --verbose --deploy-mode cluster --master yarn boo.py
The log on yarn logs -applicationId ID_number
shows:
Traceback (most recent call last):
File "boo.py", line 17, in <module>
import boto3
ImportError: No module named boto3
The python
and boto3
module I am using is
$ which python
/usr/bin/python
$ pip install boto3
Requirement already satisfied (use --upgrade to upgrade): boto3 in /usr/local/lib/python2.7/site-packages
How do I append this library path so that spark-submit
could read the boto3
module?
When you are running spark, part of the code is running on the driver, and part is running on the executors.
Did you install boto3 on the driver only, or on driver + all executors (nodes) which might run your code?
One solution might be - to install boto3 on all executors (nodes)
how to install python modules on Amazon EMR nodes:
How to bootstrap installation of Python modules on Amazon EMR?
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With