Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Install python package to PySpark Kernel in Sagemaker Notebooks

Has somebody figured out how to install packages on AWS Sagemaker Notebook instances so they are available in the PySpark kernel? I made several attempts now including the lifecycle scripts but it seems I just miss the right python env. Package in question is joblib but I guess it shouldn't matter?!

like image 450
gapvision Avatar asked Apr 08 '26 00:04

gapvision


1 Answers

Thanks for using Amazon SageMaker!

PySpark kernel unlike any other kernel is only running when there is EMR cluster to connect to. Whereas the Lifecycle Config runs before the Notebook Instance is put InService. So you cannot use Lifecycle Config to install packages in PySpark kernel, packages can only be installed after the kernel is started and connected to EMR cluster.

In order to install packages to PySpark kernel, you can do pip install <package_name> once the kernel is started up and it'll execute the command on EMR cluster master.

Thanks,

Neelam

like image 132
Neelam Gehlot Avatar answered Apr 10 '26 18:04

Neelam Gehlot



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!