I am using PySpark
kernel installed through Apache Toree
in Jupyter Notebook
using Anaconda v4.0.0
(Python 2.7.11
). After getting a table from Hive
, use matplotlib/panda
to plot some graph in Jupyter notebook, following the tutorial as below:
%matplotlib inline
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
# Set some Pandas options
pd.set_option('display.notebook_repr_html', False)
pd.set_option('display.max_columns', 20)
pd.set_option('display.max_rows', 25)
normals = pd.Series(np.random.normal(size=10))
normals.plot()
I was stuck at the first link when I tried to use %matplotlib inline which shows
Name: Error parsing magics!
Message: Magics [matplotlib] do not exist!
StackTrace:
Looking at Toree Magic and MagicManager, I realised that %matplotlib
is calling MagicManager
instead of the iPython
in-build magic command.
Is it possible for Apache Toree - PySpark
to use iPython in-build magic command instead?
Create a new kernel and point it to the root env in each project. To do so create a directory 'pyspark' in /opt/wakari/wakari-compute/share/jupyter/kernels/ . You may choose any name for the 'display_name'. This configuration is pointing to the python executable in the root environment.
You can run the notebook document step-by-step (one cell a time) by pressing shift + enter. You can run the whole notebook in a single step by clicking on the menu Cell -> Run All. To restart the kernel (i.e. the computational engine), click on the menu Kernel -> Restart.
I did a workaround hack for PySpark and magic command to work, instead of installing Toree PySpark kernel
I am using PySpark directly on Jupyter Notebook
.
Download and install Anaconda2 4.0.0
Download Spark 1.6.0
pre-built for Hadoop 2.6
Append ~/.bashrc
with the following commands and enter source ~/.bashrc
to update environment variables
# added to run spark
export PATH="{your_spark_dir}spark/sbin:$PATH"
export PATH="{your_spark_dir}spark/bin:$PATH"
# added to launch spark application in cluster mode
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64/jre
# next 2 lines are optional, needed only Spark Cluster
export HADOOP_CONF_DIR={your_hadoop_conf}/hadoop-conf
export YARN_CONF_DIR={your_hadoop_conf}/hadoop-conf
# added by Anaconda2 4.0.0 installer
export PATH="{your_anaconda_dir}/Anaconda/bin:$PATH"
# added to run pyspark in jupyter notebook
export PYSPARK_DRIVER_PYTHON={your_anaconda_dir}/Anaconda/bin/jupyter
export PYSPARK_DRIVER_PYTHON_OPTS="notebook --NotebookApp.open_browser=False --NotebookApp.ip='0.0.0.0' --NotebookApp.port=8888"
export PYSPARK_PYTHON={your_anaconda_dir}/Anaconda/bin/python
Running the Jupyter Notebook
pyspark --master=yarn --deploy-mode=client
to start the notebook running PySpark in cluster mode
Open a browser and enter IP_ADDRESS_OF_COMPUTER:8888
Disclaimer
This is only a workaround and not an actual way of fixing the problem please let me know if you found a way for Toree PySpark
ipython inbuild magic command to work. Magic command such as %matplotlib notebook
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With