Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pyspark command not recognised

I have anaconda installed and also I have downloaded Spark 1.6.2. I am using the following instructions from this answer to configure spark for Jupyter enter link description here

I have downloaded and unzipped the spark directory as

~/spark

Now when I cd into this directory and into bin I see the following

SFOM00618927A:spark $ cd bin
SFOM00618927A:bin $ ls
beeline         pyspark         run-example.cmd     spark-class2.cmd    spark-sql       sparkR
beeline.cmd     pyspark.cmd     run-example2.cmd    spark-shell     spark-submit        sparkR.cmd
load-spark-env.cmd  pyspark2.cmd        spark-class     spark-shell.cmd     spark-submit.cmd    sparkR2.cmd
load-spark-env.sh   run-example     spark-class.cmd     spark-shell2.cmd    spark-submit2.cmd

I have also added the environment variables as mentioned in the above answer to my .bash_profile and .profile

Now in the spark/bin directory first thing I want to check is if pyspark command works on shell first.

So I do this after doing cd spark/bin

SFOM00618927A:bin $ pyspark
-bash: pyspark: command not found

As per the answer after following all the steps I can just do

pyspark 

in terminal in any directory and it should start a jupyter notebook with spark engine. But even the pyspark within the shell is not working forget about making it run on juypter notebook

Please advise what is going wrong here.

Edit:

I did

open .profile 

at home directory and this is what is stored in the path.

export PATH=/Users/854319/anaconda/bin:/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin:/Library/TeX/texbin:/Users/854319/spark/bin
export PYSPARK_DRIVER_PYTHON=ipython
export PYSPARK_DRIVER_PYTHON_OPTS='notebook' pyspark
like image 472
Baktaawar Avatar asked Aug 05 '16 22:08

Baktaawar


1 Answers

1- You need to set JAVA_HOME and spark paths for the shell to find them. After setting them in your .profile you may want to

source ~/.profile

to activate the setting in the current session. From your comment I can see you're already having the JAVA_HOME issue.

Note if you have .bash_profile or .bash_login, .profile will not work as described here

2- When you are in spark/bin you need to run

./pyspark

to tell the shell that the target is in the current folder.

like image 172
shuaiyuancn Avatar answered Sep 29 '22 14:09

shuaiyuancn