Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

No module named 'resource' installing Apache Spark on Windows

I am trying to install apache spark to run locally on my windows machine. I have followed all instructions here https://medium.com/@loldja/installing-apache-spark-pyspark-the-missing-quick-start-guide-for-windows-ad81702ba62d.

After this installation I am able to successfully start pyspark, and execute a command such as

textFile = sc.textFile("README.md")

When I then execute a command that operates on textFile such as

textFile.first()

Spark gives me the error 'worker failed to connect back', and I can see an exception in the console coming from worker.py saying 'ModuleNotFoundError: No module named resource'. Looking at the source file I can see that this python file does indeed try to import the resource module, however this module is not available on windows systems. I understand that you can install spark on windows so how do I get around this?

like image 295
Hayden Avatar asked Nov 13 '18 02:11

Hayden


People also ask

How do I know if PySpark is installed on Windows?

To test if your installation was successful, open Command Prompt, change to SPARK_HOME directory and type bin\pyspark. This should start the PySpark shell which can be used to interactively work with Spark.


1 Answers

I struggled the whole morning with the same problem. Your best bet is to downgrade to Spark 2.3.2

like image 71
Luv Avatar answered Sep 20 '22 05:09

Luv