Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Combining PyCharm, Spark and Jupyter

In the current setup I use a Jupyter notebook server that has a pyspark profile to use Spark. This all works great. I'm however working on a pretty big project and the notebook environment is lacking a bit for me. I found out that PyCharm allows you to run notebooks inside the IDE, giving you more of the advantages of a full IDE as opposed to Jupyter.

In the best case scenario I would run PyCharm locally as opposed to remote desktop on the gateway but using the gateway would be an acceptable alternative.

I'm trying first to get it to work on the gateway. If I have my (spark) Jupyter server running, the IP address set correctly 127.0.0.1:8888 and I create an .ipynb file, after I enter a line and press enter (not running it, just add a newline) I get the following error in the terminal I started pycharm from:

ERROR - pplication.impl.LaterInvocator - Not a stub type: Py:IPNB_TARGET in class org.jetbrains.plugins.ipnb.psi.IpnbPyTargetExpression

Googling doesn't get me anywhere.

like image 839
Jan van der Vegt Avatar asked Jan 20 '16 10:01

Jan van der Vegt


People also ask

What is the difference between Jupyter and PyCharm?

Below is a table of differences between Jupyter and Pycharm. S.No. Jupyter. Pycharm. 1. Jupyter notebook is a web-based interactive computing platform. Pycharm is a smart code editor. 2. The notebook combines live code, equations, narrative text, visualizations, interactive dashboards and other media.

How do I open a Jupyter Notebook in PyCharm?

To start working with Jupyter notebooks in PyCharm: Create a new Python project, specify a virtual environment, and install the jupyter package. Open or create an .ipynb file. Add and edit source cells. Execute any of the code cells to launch the Jupyter server. Analyze execution results in the Preview pane.

How to set up spark for PyCharm?

How to set up Spark for PyCharm? Navigate to Project Structure -> Click on ‘Add Content Root’ -> Go to folder where Spark is setup -> Select python folder Again click on Add Content Root -> Go to Spark Folder -> expand python -> expand lib -> select py4j-0.9-src.zip and apply the changes and wait for the indexing to be done

How to integrate Jupyter notebook with spark?

Now, we can directly launch a Jupyter Notebook instance by running the pyspark command in the terminal. Important note:Always make sure to refresh the terminal environment; otherwise, the newly added environment variables will not be recognized. Now visit the provided URL, and you are ready to interact with Spark via the Jupyter Notebook.


1 Answers

I was able to get all three working by installing spark via terminal on OS X. Then I added the following packages to PyCharm project interpreter: findspark, pyspark.

Tested it out with

import findspark
findspark.init()
import pyspark
import random
sc = pyspark.SparkContext(appName="Pi")
num_samples = 100000000
def inside(p):     
  x, y = random.random(), random.random()
  return x*x + y*y < 1
count = sc.parallelize(range(0, num_samples)).filter(inside).count()
pi = 4 * count / num_samples
print(pi)
sc.stop()

outputting: 3.14160028

like image 128
Troy Kirinhakone Avatar answered Oct 12 '22 21:10

Troy Kirinhakone