Combining PyCharm, Spark and Jupyter

Tags:

In the current setup I use a Jupyter notebook server that has a pyspark profile to use Spark. This all works great. I'm however working on a pretty big project and the notebook environment is lacking a bit for me. I found out that PyCharm allows you to run notebooks inside the IDE, giving you more of the advantages of a full IDE as opposed to Jupyter.

In the best case scenario I would run PyCharm locally as opposed to remote desktop on the gateway but using the gateway would be an acceptable alternative.

I'm trying first to get it to work on the gateway. If I have my (spark) Jupyter server running, the IP address set correctly 127.0.0.1:8888 and I create an .ipynb file, after I enter a line and press enter (not running it, just add a newline) I get the following error in the terminal I started pycharm from:

ERROR - pplication.impl.LaterInvocator - Not a stub type: Py:IPNB_TARGET in class org.jetbrains.plugins.ipnb.psi.IpnbPyTargetExpression

Googling doesn't get me anywhere.

839

asked Jan 20 '16 10:01

Jan van der Vegt

1 Answers

I was able to get all three working by installing spark via terminal on OS X. Then I added the following packages to PyCharm project interpreter: findspark, pyspark.

Tested it out with

import findspark
findspark.init()
import pyspark
import random
sc = pyspark.SparkContext(appName="Pi")
num_samples = 100000000
def inside(p):     
  x, y = random.random(), random.random()
  return x*x + y*y < 1
count = sc.parallelize(range(0, num_samples)).filter(inside).count()
pi = 4 * count / num_samples
print(pi)
sc.stop()

outputting: 3.14160028

128

answered Oct 12 '22 21:10

Troy Kirinhakone

Related questions
                            
                                Job aborted due to stage failure: ShuffleMapStage 20 (repartition at data_prep.scala:87) has failed the maximum allowable number of times: 4
                            
                                apache spark: local[K] master URL - job gets stuck
                            
                                InvalidRequestException(why:empid cannot be restricted by more than one relation if it includes an Equal)
                            
                                Apache Spark (MLLib) for real time analytics
                            
                                how to fetch all of data from hbase table in spark
                            
                                Can I use Hadoop with AWS4-HMAC-SHA256?
                            
                                Why does Spark submit script spark-submit ignore `--num-executors`?
                            
                                How does the Apache Spark scheduler split files into tasks?
                            
                                How to let Spark serialize an object using Kryo?
                            
                                Spark job failing when calling first() in PySpark
                            
                                Apache Spark ALS recommendations approach
                            
                                In Apache Spark SQL, How to close metastore connection from HiveContext
                            
                                must build Spark with Hive (spark 1.5.0)
                            
                                Spark partitionBy much slower than without it

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Combining PyCharm, Spark and Jupyter

Tags:

pycharm

jupyter

apache-spark

pyspark

Jan van der Vegt

People also ask

1 Answers

Troy Kirinhakone

Recent Activity

Donate For Us