I installed Spark on Windows, and I'm unable to start pyspark
. When I type in c:\Spark\bin\pyspark
, I get the following error:
Python 3.6.0 |Anaconda custom (64-bit)| (default, Dec 23 2016, 11:57:41) [MSC v.1900 64 bit (AMD64)] on win32 Type "help", "copyright", "credits" or "license" for more information. Traceback (most recent call last): File "c:\Spark\bin..\python\pyspark\shell.py", line 30, in import pyspark File "c:\Spark\python\pyspark__init__.py", line 44, in from pyspark.context import SparkContext File "c:\Spark\python\pyspark\context.py", line 36, in from pyspark.java_gateway import launch_gateway File "c:\Spark\python\pyspark\java_gateway.py", line 31, in from py4j.java_gateway import java_import, JavaGateway, GatewayClient File "", line 961, in _find_and_load File "", line 950, in _find_and_load_unlocked File "", line 646, in _load_unlocked File "", line 616, in _load_backward_compatible File "c:\Spark\python\lib\py4j-0.10.4-src.zip\py4j\java_gateway.py", line 18, in File "C:\Users\Eigenaar\Anaconda3\lib\pydoc.py", line 62, in import pkgutil File "C:\Users\Eigenaar\Anaconda3\lib\pkgutil.py", line 22, in ModuleInfo = namedtuple('ModuleInfo', 'module_finder name ispkg') File "c:\Spark\python\pyspark\serializers.py", line 393, in namedtuple cls = _old_namedtuple(*args, **kwargs) TypeError: namedtuple() missing 3 required keyword-only arguments: 'verbose', 'rename', and 'module'
what am I doing wrong here?
PySpark Shell Another PySpark-specific way to run your programs is using the shell provided with PySpark itself. Again, using the Docker setup, you can connect to the container's CLI as described above. Then, you can run the specialized Python shell with the following command: $ /usr/local/spark/bin/pyspark Python 3.7.
In order to work with PySpark, start Command Prompt and change into your SPARK_HOME directory. a) To start a PySpark shell, run the bin\pyspark utility. Once your are in the PySpark shell use the sc and sqlContext names and type exit() to return back to the Command Prompt.
To test if your installation was successful, open Anaconda Prompt, change to SPARK_HOME directory and type bin\pyspark. This should start the PySpark shell which can be used to interactively work with Spark.
Spark 2.1.0 doesn't support python 3.6.0. To solve this change your python version in anaconda environment. Run following command in your anaconda env
conda create -n py35 python=3.5 anaconda activate py35
Spark <= 2.1.0 is not compatible with Python 3.6. See this issue, which also claims that this will be fixed with the upcoming Spark release.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With