Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

EMR notebooks install additional libraries

I'm having a surprisingly hard time working with additional libraries via my EMR notebook. The AWS interface for EMR allows me to create Jupyter notebooks and attach them to a running cluster. I'd like to use additional libraries in them. SSHing into the machines and installing manually as ec2-user or root will not make the libraries available to the notebook, as it apparently uses the livy user. Bootstrap actions install things for hadoop. I can't install from the notebook because its user apparently doesn't have sudo, git, etc., and it probably wouldn't install to the slaves anyway.

What is the canonical way of installing additional libraries for notebooks created through the EMR interface?

like image 438
Walrus the Cat Avatar asked Feb 14 '19 18:02

Walrus the Cat


People also ask

Where are EMR notebooks saved?

Each EMR notebook is saved to Amazon S3 as a file named NotebookName . ipynb . As long as a notebook file is compatible with the same version of Jupyter Notebook that EMR Notebooks is based on, you can open the notebook as an EMR notebook.

How many EMR clusters can be run simultaneously?

Q: Does Amazon EMR support multiple simultaneous cluster? You can start as many clusters as you like. When you get started, you are limited to 20 instances across all your clusters.

What is notebook in EMR?

An EMR notebook is a "serverless" notebook that you can use to run queries and code. Unlike a traditional notebook, the contents of an EMR notebook itself—the equations, queries, models, code, and narrative text within notebook cells—run in a client. The commands are executed using a kernel on the EMR cluster.


1 Answers

What is the canonical way of installing additional libraries for notebooks created through the EMR interface?

EMR Notebooks recently launched 'notebook-scoped libraries' using which you can install additional Python libraries on your cluster from public or private PyPI repository and use it within notebook session.

Notebook-scoped libraries provide the following benefits:

  • You can use libraries in an EMR notebook without having to re-create the cluster or re-attach the notebook to a cluster.
  • You can isolate library dependencies of an EMR notebook to the individual notebook session. The libraries installed from within the notebook cannot interfere with other libraries on the cluster or libraries installed within other notebook sessions.

Here are more details, https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-managed-notebooks-scoped-libraries.html

Technical blog: https://aws.amazon.com/blogs/big-data/install-python-libraries-on-a-running-cluster-with-emr-notebooks/

like image 155
Parag Chaudhari Avatar answered Sep 28 '22 04:09

Parag Chaudhari