Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Write and run pyspark in IntelliJ IDEA

i am trying to work with Pyspark in IntelliJ but i cannot figure out how to correctly install it/setup the project. I can work with Python in IntelliJ and I can use the pyspark shell but I cannot tell IntelliJ how to find the Spark files (import pyspark results in "ImportError: No module named pyspark").

Any tipps on how to include/import spark so that IntelliJ can work with it are appreciated.

Thanks.

UPDATE:

I tried this piece of code:

from pyspark import SparkContext, SparkConf
spark_conf = SparkConf().setAppName("scavenge some logs")
spark_context = SparkContext(conf=spark_conf)
address = "C:\test.txt"
log = spark_context.textFile(address)

my_result = log.filter(lambda x: 'foo' in x).saveAsTextFile('C:\my_result')

with the following error messages:

Traceback (most recent call last):
File "C:/Users/U546816/IdeaProjects/sparktestC/.idea/sparktestfile", line 2, in <module>
spark_conf = SparkConf().setAppName("scavenge some logs")
File "C:\Users\U546816\Documents\Spark\lib\spark-assembly-1.3.1-hadoop2.4.0.jar\pyspark\conf.py", line 97, in __init__
File "C:\Users\U546816\Documents\Spark\lib\spark-assembly-1.3.1-hadoop2.4.0.jar\pyspark\context.py", line 221, in _ensure_initialized
File "C:\Users\U546816\Documents\Spark\lib\spark-assembly-1.3.1-hadoop2.4.0.jar\pyspark\java_gateway.py", line 35, in launch_gateway

File "C:\Python27\lib\os.py", line 425, in __getitem__
return self.data[key.upper()]
KeyError: 'SPARK_HOME'

Process finished with exit code 1
like image 737
tandy Avatar asked Nov 02 '15 13:11

tandy


People also ask

Can we run PySpark code in IntelliJ?

IntelliJ IDEA provides run/debug configurations to run the spark-submit script in Spark's bin directory. You can execute an application locally or using an SSH configuration.

Can I write Python code in IntelliJ?

To develop Python scripts in IntelliJ IDEA, download and install Python and configure at least one Python SDK. A Python SDK can be specified as a Python interpreter for Python project. IntelliJ IDEA supports: Standard Python interpreters.

Does IntelliJ work for Python?

To work with your Python code in IntelliJ IDEA, you need to configure at least one interpreter. A system interpreter is the one that comes with your Python installation. You can use it solely for all Python scripts or take it as a base interpreter for Python virtual environments.


1 Answers

Set the env path for (SPARK_HOME and PYTHONPATH) in your program run/debug configuration.

For instance:

SPARK_HOME=/Users/<username>/javalibs/spark-1.5.0-bin-hadoop2.4/python/
PYTHON_PATH=/Users/<username>/javalibs/spark-1.5.0-bin-hadoop2.4/python/pyspark

See attached snapshot in IntelliJ Idea

Run/Debug configuration for PySpark

like image 191
Boubountu Avatar answered Oct 04 '22 09:10

Boubountu