i am trying to work with Pyspark in IntelliJ but i cannot figure out how to correctly install it/setup the project. I can work with Python in IntelliJ and I can use the pyspark shell but I cannot tell IntelliJ how to find the Spark files (import pyspark results in "ImportError: No module named pyspark").
Any tipps on how to include/import spark so that IntelliJ can work with it are appreciated.
Thanks.
UPDATE:
I tried this piece of code:
from pyspark import SparkContext, SparkConf
spark_conf = SparkConf().setAppName("scavenge some logs")
spark_context = SparkContext(conf=spark_conf)
address = "C:\test.txt"
log = spark_context.textFile(address)
my_result = log.filter(lambda x: 'foo' in x).saveAsTextFile('C:\my_result')
with the following error messages:
Traceback (most recent call last):
File "C:/Users/U546816/IdeaProjects/sparktestC/.idea/sparktestfile", line 2, in <module>
spark_conf = SparkConf().setAppName("scavenge some logs")
File "C:\Users\U546816\Documents\Spark\lib\spark-assembly-1.3.1-hadoop2.4.0.jar\pyspark\conf.py", line 97, in __init__
File "C:\Users\U546816\Documents\Spark\lib\spark-assembly-1.3.1-hadoop2.4.0.jar\pyspark\context.py", line 221, in _ensure_initialized
File "C:\Users\U546816\Documents\Spark\lib\spark-assembly-1.3.1-hadoop2.4.0.jar\pyspark\java_gateway.py", line 35, in launch_gateway
File "C:\Python27\lib\os.py", line 425, in __getitem__
return self.data[key.upper()]
KeyError: 'SPARK_HOME'
Process finished with exit code 1
IntelliJ IDEA provides run/debug configurations to run the spark-submit script in Spark's bin directory. You can execute an application locally or using an SSH configuration.
To develop Python scripts in IntelliJ IDEA, download and install Python and configure at least one Python SDK. A Python SDK can be specified as a Python interpreter for Python project. IntelliJ IDEA supports: Standard Python interpreters.
To work with your Python code in IntelliJ IDEA, you need to configure at least one interpreter. A system interpreter is the one that comes with your Python installation. You can use it solely for all Python scripts or take it as a base interpreter for Python virtual environments.
Set the env path for (SPARK_HOME
and PYTHONPATH
) in your program run/debug
configuration.
For instance:
SPARK_HOME=/Users/<username>/javalibs/spark-1.5.0-bin-hadoop2.4/python/
PYTHON_PATH=/Users/<username>/javalibs/spark-1.5.0-bin-hadoop2.4/python/pyspark
See attached snapshot in IntelliJ Idea
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With