So I am just starting out with Jupyter and the idea of notebooks. I usually program in VIM and terminal so I am still trying to figure out somethings. I am trying to use a Toree kernel. I am trying to install a kernel that is capable of executing spark and have come across Toree. I installed toree and it appears when I run kernel list. Here is the result: <code>$ jupyter kernelspec list Available kernels: python3 C:\Users\UserName\AppData\Local\Continuum\Anaconda3\lib\site-packages\ipykernel\resources bash C:\Users\UserName\AppData\Roaming\jupyter\kernels\bash toree C:\ProgramData\jupyter\kernels\toree </code> <hr> So when I open a toree notebook, the kernel dies and will not restart. Closing the notebook and reopening it results in the kernel changing to Python3. There is a large error message that gets printed to the host terminal and the notebook error message. There is another post that has been put on hold; they are the same error messages. I followed this page for the install: https://github.com/apache/incubator-toree These instructions are mostly for Linux/Mac is appears. Any thoughts on how to get a spark notebook on Jupyter? I understand there is not a lot of information here, If more is needed. Let me know.

I posted a similar question to Gitter and they replied saying (paraphrased) that: Toree is the future of spark programming on Jupyter and will appear to have installed correctly on a windows machine but the .jar and .sh files will not operate correctly on the windows machine. Knowing this, I tried it on my Linux (Fedora) and a borrowed Mac. Once jupyter was installed (and Anaconda) I entered these commands: <pre class="prettyprint"><code>$ SparkHome="~/spark/spark1.5.5-bin.hadoop2.6" $ sudo pip install toree Password: ********** $ sudo jupyter toree install --spark_home=$SparkHome </code></pre> Jupyter ran the toree notebook on both machines. I presume that a VM might work as well. I want to see if the Window's 10 bash shell will also work with this as I am running windows 7. Thanks for the other Docs!

The answer from @user3025281 solved the issue for me as well. I had to make the following adjustment for my environment (an Ubuntu 16.04 Linux distro running Spark 2.2.0 and Hadoop 2.7). The downloads are direct file downloads from the hosting sites or a mirror site. You'll be mostly configurating your environment variables then calling jupyter, assuming it's been installed through anaconda. that's pretty much it <pre class="prettyprint"><code>SPARK_HOME="~/spark/spark-2.2.0-bin-hadoop2.7" </code></pre> Write this into your <code>~/.bashrc</code> file and then call source on `.bashrc <pre class="prettyprint"><code># reload environment variables source ~/.bashrc` </code></pre> Install <pre class="prettyprint"><code>sudo pip install toree sudo jupyter toree install --spark_home=$SPARK_HOME </code></pre> Optional: On Windows 10, you could use "Bash on Ubuntu on Windows" for configurating jupyter on a linux distro

Using Spark Kernel on Jupyter

Tags:

jupyter-notebook

jupyter

apache-spark

So I am just starting out with Jupyter and the idea of notebooks.

I usually program in VIM and terminal so I am still trying to figure out somethings.

I am trying to use a Toree kernel.

I am trying to install a kernel that is capable of executing spark and have come across Toree. I installed toree and it appears when I run kernel list. Here is the result:

$ jupyter kernelspec list Available kernels: python3 C:\Users\UserName\AppData\Local\Continuum\Anaconda3\lib\site-packages\ipykernel\resources bash C:\Users\UserName\AppData\Roaming\jupyter\kernels\bash toree C:\ProgramData\jupyter\kernels\toree

So when I open a toree notebook, the kernel dies and will not restart. Closing the notebook and reopening it results in the kernel changing to Python3.

There is a large error message that gets printed to the host terminal and the notebook error message. There is another post that has been put on hold; they are the same error messages.

I followed this page for the install: https://github.com/apache/incubator-toree

These instructions are mostly for Linux/Mac is appears.

Any thoughts on how to get a spark notebook on Jupyter?

I understand there is not a lot of information here, If more is needed. Let me know.

437

asked Mar 29 '16 18:03

user3025281

Video Answer

2 Answers

I posted a similar question to Gitter and they replied saying (paraphrased) that:

Toree is the future of spark programming on Jupyter and will appear to have installed correctly on a windows machine but the .jar and .sh files will not operate correctly on the windows machine.

Knowing this, I tried it on my Linux (Fedora) and a borrowed Mac. Once jupyter was installed (and Anaconda) I entered these commands:

$ SparkHome="~/spark/spark1.5.5-bin.hadoop2.6"
$ sudo pip install toree
 Password: **********
$ sudo jupyter toree install --spark_home=$SparkHome

Jupyter ran the toree notebook on both machines. I presume that a VM might work as well. I want to see if the Window's 10 bash shell will also work with this as I am running windows 7.

Thanks for the other Docs!

answered Oct 23 '22 05:10

user3025281

The answer from @user3025281 solved the issue for me as well. I had to make the following adjustment for my environment (an Ubuntu 16.04 Linux distro running Spark 2.2.0 and Hadoop 2.7). The downloads are direct file downloads from the hosting sites or a mirror site.

You'll be mostly configurating your environment variables then calling jupyter, assuming it's been installed through anaconda. that's pretty much it

SPARK_HOME="~/spark/spark-2.2.0-bin-hadoop2.7"

Write this into your ~/.bashrc file and then call source on `.bashrc

# reload environment variables
source ~/.bashrc`

Install

sudo pip install toree
sudo jupyter toree install --spark_home=$SPARK_HOME

Optional: On Windows 10, you could use "Bash on Ubuntu on Windows" for configurating jupyter on a linux distro

answered Oct 23 '22 05:10

Joyoyoyoyoyo

Related questions
                            
                                No module named numpy when spark-submitting
                            
                                spark cache only keeps a fraction of RDD
                            
                                joins and cogroup in Spark
                            
                                Spark - failed on connection exception: java.net.ConnectException - localhost
                            
                                Error while installing Apache SparkR package
                            
                                Joining two DataFrames from the same source
                            
                                Connecting from Spark/pyspark to PostgreSQL
                            
                                how do I preserve the key or index of input to Spark HashingTF() function?
                            
                                Can I change Spark's executor memory at runtime?
                            
                                How to specify a missing value in a dataframe
                            
                                Spark joinWithCassandraTable() on map multiple partition key ERROR
                            
                                Spark + Python - Java gateway process exited before sending the driver its port number?
                            
                                How do you add a numpy.array as a new column to a pyspark.SQL DataFrame?
                            
                                Apache Spark MLlib Model File Format
                            
                                Excessive partitioning (too many tasks) on Apache Spark/Cassandra cluster
                            
                                SparkStreaming - ExitCodeException exitCode=13
                            
                                Spark-shell connecting to Mesos stuck at sched.cpp
                            
                                Why does pyspark give "we couldn't find any external IP address" on macOS?
                            
                                SQLContext implicits
                            
                                Spark job restarted after showing all jobs completed and then fails (TimeoutException: Futures timed out after [300 seconds])

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Using Spark Kernel on Jupyter

Tags:

jupyter-notebook

jupyter

apache-spark

user3025281

People also ask

Video Answer

2 Answers

user3025281

Joyoyoyoyoyo

Recent Activity

Donate For Us