Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python packages not importing in AWS EMR

I am trying to submit a job to EMR cluster via Livy. My Python script (to submit job) requires importing a few packages. I have installed all those packages on the master node of EMR. The main script resides on S3 which is being called by the script to submit job to Livy from EC2. Everytime I try to run the job on a remote machine (EC2), it dies stating Import Errors(no module named [mod name] )

I have been stuck on it for more than a week and unable to find a possible solution. Any help would be highly appreciated. Thanks.

like image 586
Shweta Avatar asked Feb 04 '26 07:02

Shweta


1 Answers

These packages that you are trying to import. Are they custom packages ? if so how did you package them. Did you create a wheel file or zip file and specify them as --py-files in your spark submit via livy ?

Possible problem.

You installed the packages only on the master node. You will need to log into your worker nodes and install the packages there too. Else when u provision the emr , install the packages using bootstrap actions

You should be able to add libraries via —py-files option, but it’s safer to just download the wheel files and use them rather than zipping anything yourself.

like image 166
Emerson Avatar answered Feb 06 '26 20:02

Emerson