I am trying to submit a job to EMR cluster via Livy. My Python script (to submit job) requires importing a few packages. I have installed all those packages on the master node of EMR. The main script resides on S3 which is being called by the script to submit job to Livy from EC2. Everytime I try to run the job on a remote machine (EC2), it dies stating Import Errors(no module named [mod name] )
I have been stuck on it for more than a week and unable to find a possible solution. Any help would be highly appreciated. Thanks.
These packages that you are trying to import. Are they custom packages ? if so how did you package them. Did you create a wheel file or zip file and specify them as --py-files in your spark submit via livy ?
Possible problem.
You installed the packages only on the master node. You will need to log into your worker nodes and install the packages there too. Else when u provision the emr , install the packages using bootstrap actions
You should be able to add libraries via —py-files option, but it’s safer to just download the wheel files and use them rather than zipping anything yourself.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With