This question could apply really to any Python packages. I have a bootstrap script that runs before my Spark jobs, and I assume that I need to install pandas in that script. I've tried many different things, but nothing seems to work (pip install, easy_install, yum install, etc). The jobs all fail when in Spark pandas is failed to be imported. I'm running EMR v5.12.1 and Python 3.4.
sudo python3 -m pip install pandas
This is what we have written in our bootstarp.sh
to install pandas
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With