i am trying to work with Pyspark in IntelliJ but i cannot figure out how to correctly install it/setup the project. I can work with Python in IntelliJ and I can use the pyspark shell but I cannot tell IntelliJ how to find the Spark files (import pyspark results in "ImportError: No module named pyspark"). Any tipps on how to include/import spark so that IntelliJ can work with it are appreciated. Thanks. UPDATE: I tried this piece of code: <pre class="prettyprint"><code>from pyspark import SparkContext, SparkConf spark_conf = SparkConf().setAppName("scavenge some logs") spark_context = SparkContext(conf=spark_conf) address = "C:\test.txt" log = spark_context.textFile(address) my_result = log.filter(lambda x: 'foo' in x).saveAsTextFile('C:\my_result') </code></pre> with the following error messages: <pre class="prettyprint"><code>Traceback (most recent call last): File "C:/Users/U546816/IdeaProjects/sparktestC/.idea/sparktestfile", line 2, in <module> spark_conf = SparkConf().setAppName("scavenge some logs") File "C:\Users\U546816\Documents\Spark\lib\spark-assembly-1.3.1-hadoop2.4.0.jar\pyspark\conf.py", line 97, in __init__ File "C:\Users\U546816\Documents\Spark\lib\spark-assembly-1.3.1-hadoop2.4.0.jar\pyspark\context.py", line 221, in _ensure_initialized File "C:\Users\U546816\Documents\Spark\lib\spark-assembly-1.3.1-hadoop2.4.0.jar\pyspark\java_gateway.py", line 35, in launch_gateway File "C:\Python27\lib\os.py", line 425, in __getitem__ return self.data[key.upper()] KeyError: 'SPARK_HOME' Process finished with exit code 1 </code></pre>

Set the env path for (<code>SPARK_HOME</code> and <code>PYTHONPATH</code>) in your program run/debug configuration. For instance: <pre class="prettyprint"><code>SPARK_HOME=/Users/<username>/javalibs/spark-1.5.0-bin-hadoop2.4/python/ PYTHON_PATH=/Users/<username>/javalibs/spark-1.5.0-bin-hadoop2.4/python/pyspark </code></pre> See attached snapshot in IntelliJ Idea <img src="https://i.stack.imgur.com/XMuHo.png" alt="Run/Debug configuration for PySpark">

Write and run pyspark in IntelliJ IDEA

Tags:

python

intellij-idea

apache-spark

pyspark

i am trying to work with Pyspark in IntelliJ but i cannot figure out how to correctly install it/setup the project. I can work with Python in IntelliJ and I can use the pyspark shell but I cannot tell IntelliJ how to find the Spark files (import pyspark results in "ImportError: No module named pyspark").

Any tipps on how to include/import spark so that IntelliJ can work with it are appreciated.

Thanks.

UPDATE:

I tried this piece of code:

from pyspark import SparkContext, SparkConf
spark_conf = SparkConf().setAppName("scavenge some logs")
spark_context = SparkContext(conf=spark_conf)
address = "C:\test.txt"
log = spark_context.textFile(address)

my_result = log.filter(lambda x: 'foo' in x).saveAsTextFile('C:\my_result')

with the following error messages:

Traceback (most recent call last):
File "C:/Users/U546816/IdeaProjects/sparktestC/.idea/sparktestfile", line 2, in <module>
spark_conf = SparkConf().setAppName("scavenge some logs")
File "C:\Users\U546816\Documents\Spark\lib\spark-assembly-1.3.1-hadoop2.4.0.jar\pyspark\conf.py", line 97, in __init__
File "C:\Users\U546816\Documents\Spark\lib\spark-assembly-1.3.1-hadoop2.4.0.jar\pyspark\context.py", line 221, in _ensure_initialized
File "C:\Users\U546816\Documents\Spark\lib\spark-assembly-1.3.1-hadoop2.4.0.jar\pyspark\java_gateway.py", line 35, in launch_gateway

File "C:\Python27\lib\os.py", line 425, in __getitem__
return self.data[key.upper()]
KeyError: 'SPARK_HOME'

Process finished with exit code 1

737

asked Nov 02 '15 13:11

tandy

1 Answers

Set the env path for (SPARK_HOME and PYTHONPATH) in your program run/debug configuration.

For instance:

SPARK_HOME=/Users/<username>/javalibs/spark-1.5.0-bin-hadoop2.4/python/
PYTHON_PATH=/Users/<username>/javalibs/spark-1.5.0-bin-hadoop2.4/python/pyspark

See attached snapshot in IntelliJ Idea

Run/Debug configuration for PySpark

191

answered Oct 04 '22 09:10

Boubountu

Related questions
                            
                                Poisson Point Process in Python 3 with numpy, without scipy
                            
                                how to make an exception for broken pipe errors on flask, when the client disconnects prematurely?
                            
                                Asynchronous RabbitMQ consumer with aioamqp
                            
                                Tab completion in ipython for list elements
                            
                                How to add punctuation to text using python?
                            
                                How to prevent overwritting Python Built-in Function by accident?
                            
                                Anaconda Python: How to install missing dependency?
                            
                                Group By in mongoengine EmbeddedDocumentListField
                            
                                python pandas: how to avoid chained assignment
                            
                                get mask from contour with OpenCV
                            
                                How to change window sizes/dimensions via Python
                            
                                SQLAlchemy correlated update for multiple columns
                            
                                IPython: Configure Base Url Path for All Request
                            
                                Create scipy curve fitting definitions for fourier series dynamically
                            
                                Why does isinstance([1, 2, 3], List[str]) evaluate to true?
                            
                                pickling scipy interp1d spline
                            
                                How to extract white region in an image
                            
                                How to bring figure legend to front?
                            
                                Evaluating function using numpy array returns inf and nan
                            
                                xgboost predict method returns the same predicted value for all rows

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With