So I am just starting out with Jupyter and the idea of notebooks.
I usually program in VIM and terminal so I am still trying to figure out somethings.
I am trying to use a Toree kernel.
I am trying to install a kernel that is capable of executing spark and have come across Toree. I installed toree and it appears when I run kernel list. Here is the result:
$ jupyter kernelspec list
Available kernels:
python3 C:\Users\UserName\AppData\Local\Continuum\Anaconda3\lib\site-packages\ipykernel\resources
bash C:\Users\UserName\AppData\Roaming\jupyter\kernels\bash
toree C:\ProgramData\jupyter\kernels\toree
So when I open a toree notebook, the kernel dies and will not restart. Closing the notebook and reopening it results in the kernel changing to Python3.
There is a large error message that gets printed to the host terminal and the notebook error message. There is another post that has been put on hold; they are the same error messages.
I followed this page for the install: https://github.com/apache/incubator-toree
These instructions are mostly for Linux/Mac is appears.
Any thoughts on how to get a spark notebook on Jupyter?
I understand there is not a lot of information here, If more is needed. Let me know.
PySpark allows users to interact with Apache Spark without having to learn a different language like Scala. The combination of Jupyter Notebooks with Spark provides developers with a powerful and familiar development environment while harnessing the power of Apache Spark.
I posted a similar question to Gitter and they replied saying (paraphrased) that:
Toree is the future of spark programming on Jupyter and will appear to have installed correctly on a windows machine but the .jar and .sh files will not operate correctly on the windows machine.
Knowing this, I tried it on my Linux (Fedora) and a borrowed Mac. Once jupyter was installed (and Anaconda) I entered these commands:
$ SparkHome="~/spark/spark1.5.5-bin.hadoop2.6"
$ sudo pip install toree
Password: **********
$ sudo jupyter toree install --spark_home=$SparkHome
Jupyter ran the toree notebook on both machines. I presume that a VM might work as well. I want to see if the Window's 10 bash shell will also work with this as I am running windows 7.
Thanks for the other Docs!
The answer from @user3025281 solved the issue for me as well. I had to make the following adjustment for my environment (an Ubuntu 16.04 Linux distro running Spark 2.2.0 and Hadoop 2.7). The downloads are direct file downloads from the hosting sites or a mirror site.
You'll be mostly configurating your environment variables then calling jupyter, assuming it's been installed through anaconda. that's pretty much it
SPARK_HOME="~/spark/spark-2.2.0-bin-hadoop2.7"
Write this into your ~/.bashrc
file and then call source on `.bashrc
# reload environment variables
source ~/.bashrc`
Install
sudo pip install toree
sudo jupyter toree install --spark_home=$SPARK_HOME
Optional: On Windows 10, you could use "Bash on Ubuntu on Windows" for configurating jupyter on a linux distro
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With