I am trying to install apache spark to run locally on my windows machine. I have followed all instructions here https://medium.com/@loldja/installing-apache-spark-pyspark-the-missing-quick-start-guide-for-windows-ad81702ba62d.
After this installation I am able to successfully start pyspark, and execute a command such as
textFile = sc.textFile("README.md")
When I then execute a command that operates on textFile such as
textFile.first()
Spark gives me the error 'worker failed to connect back', and I can see an exception in the console coming from worker.py saying 'ModuleNotFoundError: No module named resource'. Looking at the source file I can see that this python file does indeed try to import the resource module, however this module is not available on windows systems. I understand that you can install spark on windows so how do I get around this?
To test if your installation was successful, open Command Prompt, change to SPARK_HOME directory and type bin\pyspark. This should start the PySpark shell which can be used to interactively work with Spark.
I struggled the whole morning with the same problem. Your best bet is to downgrade to Spark 2.3.2
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With