Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Environment variables set up in Windows for pyspark

I have Spark installed in my laptop. And I am able to execute spark-shell command and open the scala shell as shown below:

C:\Spark1_6\spark-1.6.0-bin-hadoop2.6\bin>spark-shell
scala>

But when I am trying to execute pyspark command:

C:\Spark1_6\spark-1.6.0-bin-hadoop2.6\bin>pyspark

I am getting the below error message:

'python' is not recognized as an internal or external command

I did set up the environment User 'Path' variable manually. By appending with

";C:\Python27"

I rebooted the laptop and still get the same error. Can anyone please help me how to fix this ? Am I not correctly updating the environment variable?

Versions: Spark: 1.6.2 Windows: 8.1

like image 819
Sri Avatar asked Jun 15 '17 13:06

Sri


1 Answers

The Spark documentation is available. Don't be afraid, read it.

http://spark.apache.org/docs/1.6.0/configuration.html#environment-variables

Certain Spark settings can be configured through environment variables, which are read from ... conf\spark-env.cmd on Windows
...
PYSPARK_PYTHON   Python binary executable to use for PySpark in both driver and workers (default is python2.7 if available, otherwise python).
PYSPARK_DRIVER_PYTHON   Python binary executable to use for PySpark in driver only (default is PYSPARK_PYTHON).

Try something like this:

set PYSPARK_PYTHON=C:\Python27\bin\python.exe
pyspark
like image 96
Samson Scharfrichter Avatar answered Sep 17 '22 11:09

Samson Scharfrichter