Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using Spark Kernel on Jupyter

So I am just starting out with Jupyter and the idea of notebooks.

I usually program in VIM and terminal so I am still trying to figure out somethings.

I am trying to use a Toree kernel.

I am trying to install a kernel that is capable of executing spark and have come across Toree. I installed toree and it appears when I run kernel list. Here is the result:

$ jupyter kernelspec list Available kernels: python3 C:\Users\UserName\AppData\Local\Continuum\Anaconda3\lib\site-packages\ipykernel\resources bash C:\Users\UserName\AppData\Roaming\jupyter\kernels\bash toree C:\ProgramData\jupyter\kernels\toree


So when I open a toree notebook, the kernel dies and will not restart. Closing the notebook and reopening it results in the kernel changing to Python3.

There is a large error message that gets printed to the host terminal and the notebook error message. There is another post that has been put on hold; they are the same error messages.

I followed this page for the install: https://github.com/apache/incubator-toree

These instructions are mostly for Linux/Mac is appears.

Any thoughts on how to get a spark notebook on Jupyter?

I understand there is not a lot of information here, If more is needed. Let me know.

like image 437
user3025281 Avatar asked Mar 29 '16 18:03

user3025281


People also ask

Can you use Spark in Jupyter notebook?

PySpark allows users to interact with Apache Spark without having to learn a different language like Scala. The combination of Jupyter Notebooks with Spark provides developers with a powerful and familiar development environment while harnessing the power of Apache Spark.


Video Answer


2 Answers

I posted a similar question to Gitter and they replied saying (paraphrased) that:

Toree is the future of spark programming on Jupyter and will appear to have installed correctly on a windows machine but the .jar and .sh files will not operate correctly on the windows machine.

Knowing this, I tried it on my Linux (Fedora) and a borrowed Mac. Once jupyter was installed (and Anaconda) I entered these commands:

$ SparkHome="~/spark/spark1.5.5-bin.hadoop2.6"
$ sudo pip install toree
 Password: **********
$ sudo jupyter toree install --spark_home=$SparkHome

Jupyter ran the toree notebook on both machines. I presume that a VM might work as well. I want to see if the Window's 10 bash shell will also work with this as I am running windows 7.

Thanks for the other Docs!

like image 68
user3025281 Avatar answered Oct 23 '22 05:10

user3025281


The answer from @user3025281 solved the issue for me as well. I had to make the following adjustment for my environment (an Ubuntu 16.04 Linux distro running Spark 2.2.0 and Hadoop 2.7). The downloads are direct file downloads from the hosting sites or a mirror site.

You'll be mostly configurating your environment variables then calling jupyter, assuming it's been installed through anaconda. that's pretty much it

SPARK_HOME="~/spark/spark-2.2.0-bin-hadoop2.7"

Write this into your ~/.bashrc file and then call source on `.bashrc

# reload environment variables
source ~/.bashrc` 

Install

sudo pip install toree
sudo jupyter toree install --spark_home=$SPARK_HOME

Optional: On Windows 10, you could use "Bash on Ubuntu on Windows" for configurating jupyter on a linux distro

like image 41
Joyoyoyoyoyo Avatar answered Oct 23 '22 05:10

Joyoyoyoyoyo