Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Unable to run pyspark

Tags:

I installed Spark on Windows, and I'm unable to start pyspark. When I type in c:\Spark\bin\pyspark, I get the following error:

Python 3.6.0 |Anaconda custom (64-bit)| (default, Dec 23 2016, 11:57:41) [MSC v.1900 64 bit (AMD64)] on win32 Type "help", "copyright", "credits" or "license" for more information. Traceback (most recent call last): File "c:\Spark\bin..\python\pyspark\shell.py", line 30, in import pyspark File "c:\Spark\python\pyspark__init__.py", line 44, in from pyspark.context import SparkContext File "c:\Spark\python\pyspark\context.py", line 36, in from pyspark.java_gateway import launch_gateway File "c:\Spark\python\pyspark\java_gateway.py", line 31, in from py4j.java_gateway import java_import, JavaGateway, GatewayClient File "", line 961, in _find_and_load File "", line 950, in _find_and_load_unlocked File "", line 646, in _load_unlocked File "", line 616, in _load_backward_compatible File "c:\Spark\python\lib\py4j-0.10.4-src.zip\py4j\java_gateway.py", line 18, in File "C:\Users\Eigenaar\Anaconda3\lib\pydoc.py", line 62, in import pkgutil File "C:\Users\Eigenaar\Anaconda3\lib\pkgutil.py", line 22, in ModuleInfo = namedtuple('ModuleInfo', 'module_finder name ispkg') File "c:\Spark\python\pyspark\serializers.py", line 393, in namedtuple cls = _old_namedtuple(*args, **kwargs) TypeError: namedtuple() missing 3 required keyword-only arguments: 'verbose', 'rename', and 'module'

what am I doing wrong here?

like image 793
DerkIII Avatar asked Feb 20 '17 16:02

DerkIII


People also ask

How do I run a PySpark program?

PySpark Shell Another PySpark-specific way to run your programs is using the shell provided with PySpark itself. Again, using the Docker setup, you can connect to the container's CLI as described above. Then, you can run the specialized Python shell with the following command: $ /usr/local/spark/bin/pyspark Python 3.7.

How do I run PySpark on Windows?

In order to work with PySpark, start Command Prompt and change into your SPARK_HOME directory. a) To start a PySpark shell, run the bin\pyspark utility. Once your are in the PySpark shell use the sc and sqlContext names and type exit() to return back to the Command Prompt.

How do I know if PySpark is working?

To test if your installation was successful, open Anaconda Prompt, change to SPARK_HOME directory and type bin\pyspark. This should start the PySpark shell which can be used to interactively work with Spark.


2 Answers

Spark 2.1.0 doesn't support python 3.6.0. To solve this change your python version in anaconda environment. Run following command in your anaconda env

conda create -n py35 python=3.5 anaconda activate py35 
like image 156
Satyam Avatar answered Sep 23 '22 21:09

Satyam


Spark <= 2.1.0 is not compatible with Python 3.6. See this issue, which also claims that this will be fixed with the upcoming Spark release.

like image 26
karlson Avatar answered Sep 23 '22 21:09

karlson