This is very close to this question, but I have added a few details specific to my question:
Matplotlib Plotting using AWS-EMR jupyter notebook
I would like to find a way to use matplotlib inside my Jupyter notebook. Here is the code-snippet in error, it's fairly simple:
notebook
import matplotlib matplotlib.use("agg") import matplotlib.pyplot as plt plt.plot([1,2,3,4]) plt.show()
I chose this snippet because this line alone fails as it tries to use TKinter (which is not installed on an AWS EMR cluster):
import matplotlib.pyplot as plt
When I run the full notebook snippet, the result is no runtime error but also nothing happens (no graph is shown.) My understanding on one way this can work is by adding either of the following snips:
pyspark magic notation
%matplotlib inline
results
unknown magic command 'matplotlib' UnknownMagic: unknown magic command 'matplotlib'
IPython explicit magic call
from IPython import get_ipython get_ipython().run_line_magic('matplotlib', 'inline')
results
'NoneType' object has no attribute 'run_line_magic' Traceback (most recent call last): AttributeError: 'NoneType' object has no attribute 'run_line_magic'
to my notebook which invokes a spark magic command which inlines matplotlib plots (at least that's my interpretation.) I have tried both of these after using a bootstrap action:
EMR bootstrap
sudo pip install matplotlib sudo pip install ipython
Even with these added, I still get an error that there is no magic for matplotlib. So my question is definitely:
Question
How do I make matplotlib work in an AWS EMR Jupyter notebook?
(Or how do I view graphs and plot images in AWS EMR Jupyter notebook?)
Install Matplotlib Make sure you first have Jupyter notebook installed, then we can add Matplotlib to our virtual environment. To do so, navigate to the command prompt and type pip install matplotlib. Now launch your Jupyter notebook by simply typing jupyter notebook at the command prompt.
The most straightforward way would be to create a bash script containing your installation commands, copy it to S3, and set a bootstrap action from the console to point to your script. this will install the packages on one of the nodes in the EMR cluster.
Why matplotlib inline is used. You can use the magic function %matplotlib inline to enable the inline plotting, where the plots/graphs will be displayed just below the cell where your plotting commands are written. It provides interactivity with the backend in the frontends like the jupyter notebook.
As you mentioned, matplotlib
is not installed on the EMR cluster, therefore such error will occur:
However, it is actually available in the managed Jupyter notebook instance (the docker container). Using the %%local
magic will allow you to run the cell locally:
The answer by @00schneider actually works.
import matplotlib.pyplot as plt # plot data here plt.show()
after
plt.show()
re-run the magic cell that contains the below, and you will see a plot on your AWS EMR Jupyter PySpark notebook
%matplot plt
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With