Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Trying to install pandas for Pyspark running on Amazon EMR

This question could apply really to any Python packages. I have a bootstrap script that runs before my Spark jobs, and I assume that I need to install pandas in that script. I've tried many different things, but nothing seems to work (pip install, easy_install, yum install, etc). The jobs all fail when in Spark pandas is failed to be imported. I'm running EMR v5.12.1 and Python 3.4.

like image 432
Evan Zamir Avatar asked Apr 03 '18 19:04

Evan Zamir


1 Answers

sudo python3 -m pip install pandas

This is what we have written in our bootstarp.sh to install pandas.

like image 123
harmands Avatar answered Nov 04 '22 19:11

harmands