Has somebody figured out how to install packages on AWS Sagemaker Notebook instances so they are available in the PySpark kernel? I made several attempts now including the lifecycle scripts but it seems I just miss the right python env. Package in question is joblib but I guess it shouldn't matter?!
Thanks for using Amazon SageMaker!
PySpark kernel unlike any other kernel is only running when there is EMR cluster to connect to. Whereas the Lifecycle Config runs before the Notebook Instance is put InService. So you cannot use Lifecycle Config to install packages in PySpark kernel, packages can only be installed after the kernel is started and connected to EMR cluster.
In order to install packages to PySpark kernel, you can do pip install <package_name> once the kernel is started up and it'll execute the command on EMR cluster master.
Thanks,
Neelam
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With