Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Zeppelin %python.conda and %python.sql interpreters do not work without adding Anaconda libraries to %PATH

I have the following situation: I want to use Anaconda3 with Zeppelin and Spark.

I have installed the following components:

  • HDP 2.5
  • Spark 2.0.0.x (the version which comes with HDP 2.5)
  • Zeppelin 0.7.3
  • Anaconda3 with Python 3.5.4 (PySpark in Spark 2.0.0 and Python 3.6 are not friends)
  • Python 2.7 comes with HDP 2.5 and is available in /usr/bin and this path is added to $PATH

Basically I configure the Python interpreter to point to my anaconda version, in my case /opt/anaconda3/bin/python and this is working. I also edited the zeppelin.sh script with:

export PYTHONPATH="${SPARK_HOME}/python:${SPARK_HOME}/python/lib/py4j-0.8.2.1-src.zip"
export SPARK_YARN_USER_ENV="PYTHONPATH=${PYTHONPATH}"
export PYSPARK_DRIVER_PYTHON="/var/opt/teradata/anaconda3/envs/py35/bin/ipython"
export PYSPARK_PYTHON="/var/opt/teradata/anaconda3/envs/py35/bin/python"
export PYLIB="/var/opt/teradata/anaconda3/envs/py35/lib"

Till here everything is Ok.

When I try the %python.conda and %python.sql interpreters , they failed because the conda command is not found and the pandas also not. I added the libraries location to the $PATH environment variable, and Zeppelin is able to find these commands but the side effect is, the default Python version for the whole environment becomes the 3.5 instead of the 2.7 and I start to get another nice error like this one:

apache.zeppelin.interpreter.InterpreterException:   File "/usr/bin/hdp-select", line 205
    print "ERROR: Invalid package - " + name
                                    ^
SyntaxError: Missing parentheses in call to 'print'
ls: cannot access /usr/hdp//hadoop/lib: No such file or directory
Exception in thread "main" java.lang.IllegalStateException: hdp.version is not set while running Spark under HDP, please set through HDP_VERSION in spark-env.sh or add a java-opts file in conf with -Dhdp.version=xxx

When I switch back and erase the Python3 libraries from $PATH it works again.

Is there any optimal way to configure my environment in order to make everything works and keep it manageable and easy to maintain?

I was thinking in creating symlinks in /var/lib for the files that need to be found, but I don’t know how many will be needed and I don’t want to create links for everyone except python3.

Any comment will be highly appreciated.

Kind Regards, Paul

like image 547
Playing With BI Avatar asked Sep 18 '25 19:09

Playing With BI


1 Answers

I ran into the same error. Upon investigating, I tracked down the source of the error here. Looks like Zeppelin is defaulting to "/bin/conda" for the default path for conda.

I was able to fix it by doing the following:

  • Create a symlink to /bin/conda: ln -s /opt/anaconda3/bin/conda /bin/conda
  • Create a symlink to /bin/python: ln -s /opt/anaconda3/bin/python /bin/python
  • In the settings for the Python interpreter, set zeppelin.python to /opt/anaconda3/bin/python3
  • Set the PYTHONPATH in /usr/lib/zeppelin/conf/zeppelin-env.sh to export PYTHONPATH=/opt/anaconda3/bin

Looks like there is also a JIRA issue for this behavior here.

like image 180
Matt Howell Avatar answered Sep 21 '25 12:09

Matt Howell