Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

The system cannot find the path specified error while running pyspark

I just downloaded spark-2.3.0-bin-hadoop2.7.tgz. After downloading I followed the steps mentioned here pyspark installation for windows 10.I used the comment bin\pyspark to run the spark & got error message

The system cannot find the path specified

Attached is the screen shot of error messageenter image description here

Attached is the screen shot of my spark bin folder enter image description here

Screen shot of my path variable looks like

enter image description here enter image description here I have python 3.6 & Java "1.8.0_151" in my windows 10 system Can you suggest me how to resolve this issue?

like image 529
Christina Hughes Avatar asked Mar 17 '18 19:03

Christina Hughes


6 Answers

Actually, the problem was with the JAVA_HOME environment variable path. The JAVA_HOME path was set to .../jdk/bin previously,

I stripped the last /bin part for JAVA_HOME while keeping it (/jdk/bin) in system or environment path variable (%path%) did the trick.

like image 91
Alee Ahmed Avatar answered Oct 03 '22 06:10

Alee Ahmed


My problem was that the JAVA_HOME was pointing to JRE folder instead of JDK. Make sure that you take care of that

like image 44
Michal Avatar answered Oct 03 '22 08:10

Michal


Worked hours and hours on this. My problem was with Java 10 installation. I uninstalled it and installed Java 8, and now Pyspark works.

like image 37
aghd Avatar answered Oct 03 '22 07:10

aghd


Switching SPARK_HOME to C:\spark\spark-2.3.0-bin-hadoop2.7 and changing PATH to include %SPARK_HOME%\bin did the trick for me.

Originally my SPARK_HOME was set to C:\spark\spark-2.3.0-bin-hadoop2.7\bin and PATH was referencing it as %SPARK_HOME%.

Running a spark command directly in my SPARK_HOME dir worked but only once. After that initial success I then noticed your same error and that echo %SPARK_HOME% was showing C:\spark\spark-2.3.0-bin-hadoop2.7\bin\.. I thought perhaps spark-shell2.cmd had edited it in attempts to get itself working, which led me here.

like image 29
yoyomeng Avatar answered Oct 03 '22 07:10

yoyomeng


For those who use Windows and still trying, what solved to me was reinstalling Python (3.9) as a local user (c:\Users\<user>\AppData\Local\Programs\Python) and defined both env variables PYSPARK_PYTHON and PYSPARK_DRIVER_PYTHON to c:\Users\<user>\AppData\Local\Programs\Python\python.exe

like image 21
noStaleReads Avatar answered Oct 03 '22 06:10

noStaleReads


Fixing problems installing Pyspark (Windows)

Incorrect JAVA_HOME path

> pyspark  
The system cannot find the path specified.

Open System Environment variables:

rundll32 sysdm.cpl,EditEnvironmentVariables

Set JAVA_HOME: System Variables > New:

Variable Name: JAVA_HOME
Variable Value: C:\Program Files\Java\jdk1.8.0_261

Also, check that SPARK_HOME and HADOOP_HOME are correctly set, e.g.:

SPARK_HOME=C:\Spark\spark-3.2.0-bin-hadoop3.2
HADOOP_HOME=C:\Spark\spark-3.2.0-bin-hadoop3.2

Important: Double-check the following

  1. The path exists
  2. The path does not contain the bin folder

Incorrect Java version

> pyspark
WARN SparkContext: Another SparkContext is being constructed 
UserWarning: Failed to initialize Spark session.
java.lang.NoClassDefFoundError: Could not initialize class org.apache.spark.storage.StorageUtils$

Ensure that JAVA_HOME is set to Java 8 (jdk1.8.0)

winutils not installed

> pyspark
WARN Shell: Did not find winutils.exe
java.io.FileNotFoundException: Could not locate Hadoop executable

Download winutils.exe and copy it to your spark home bin folder

 curl -OutFile C:\Spark\spark-3.2.0-bin-hadoop3.2\bin\winutils.exe -Uri https://github.com/steveloughran/winutils/raw/master/hadoop-3.0.0/bin/winutils.exe
like image 44
Leo103 Avatar answered Oct 03 '22 07:10

Leo103