Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

PyCharm overwrites PYTHONPATH in a docker container being used as an interpreter

I have a docker image containing various bits, including Spark. Here is my Dockerfile:

FROM docker-dev.artifactory.company.com/centos:7.3.1611

# set proxy
ENV http_proxy http://proxyaddr.co.uk:8080
ENV HTTPS_PROXY http://proxyaddr.co.uk:8080
ENV https_proxy http://proxyaddr.co.uk:8080

RUN yum install -y epel-release
RUN yum install -y gcc
RUN yum install -y krb5-devel
RUN yum install -y python-devel
RUN yum install -y krb5-workstation
RUN yum install -y python-setuptools
RUN yum install -y python-pip
RUN yum install -y xmlstarlet
RUN yum install -y wget java-1.8.0-openjdk
RUN pip install kerberos
RUN pip install numpy
RUN pip install pandas
RUN pip install coverage
RUN pip install tensorflow
RUN wget http://d3kbcqa49mib13.cloudfront.net/spark-1.6.0-bin-hadoop2.6.tgz
RUN tar -xvzf spark-1.6.0-bin-hadoop2.6.tgz -C /opt
RUN ln -s spark-1.6.0-bin-hadoop2.6 /opt/spark


ENV VERSION_NUMBER $(cat VERSION)
ENV JAVA_HOME /etc/alternatives/jre/
ENV SPARK_HOME /opt/spark
ENV PYTHONPATH $SPARK_HOME/python/:$PYTHONPATH
ENV PYTHONPATH $SPARK_HOME/python/lib/py4j-0.9-src.zip:$PYTHONPATH

I can build then run that docker image, connect to it, and successfully import the pyspark libraries:

$ docker run -d -it sse_spark_build:1.0
09e8aac622d7500e147a6e6db69f806fe093b0399b98605c5da2ff5e0feca07c
$ docker exec -it 09e8aac622d7 python
Python 2.7.5 (default, Nov  6 2016, 00:28:07)
[GCC 4.8.5 20150623 (Red Hat 4.8.5-11)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from pyspark import SparkContext
>>>import os
>>> os.environ['PYTHONPATH']
'/opt/spark/python/lib/py4j-0.9-src.zip:/opt/spark/python/:'
>>>

Note the value of PYTHONPATH!

Problem is that the behaviour in PyCharm is different if I use this same docker image as the interpreter. Here's how I have set up the interpreter:

python interpreter setup

If I then run Python console in PyCharm this happens:

bec0b9189066:python /opt/.pycharm_helpers/pydev/pydevconsole.py 0 0
PyDev console: starting.
import sys; print('Python %s on %s' % (sys.version, sys.platform))
sys.path.extend(['/home/cengadmin/git/dhgitlab/sse/engine/fs/programs/pyspark', '/home/cengadmin/git/dhgitlab/sse/engine/fs/programs/pyspark'])
Python 2.7.5 (default, Nov  6 2016, 00:28:07) 
[GCC 4.8.5 20150623 (Red Hat 4.8.5-11)] on linux2
import os
os.environ['PYTHONPATH']
'/opt/.pycharm_helpers/pydev'

As you can see PyCharm has changed PYTHONPATH meaning that I can no longer use the pyspark libraries that I want to use:

from pyspark import SparkContext
Traceback (most recent call last):
  File "<input>", line 1, in <module>
ImportError: No module named pyspark

OK, I could change PATH from the console to make it work:

import sys
sys.path.append('/opt/spark/python/')
sys.path.append('/opt/spark/python/lib/py4j-0.9-src.zip')

but its tedious to have to do that every time I open a console. I can't believe there isn't a way of telling PyCharm to append to PYTHONPATH rather than overwriting it but if there is I can't find it. Can anyone offer any advice? How can I use a docker image as PyCharm's remote interpreter and keep the value of PYTHONPATH?

like image 220
jamiet Avatar asked Jul 29 '17 21:07

jamiet


1 Answers

You can set that in Preferences. See the below image Setting the environment setup

You either set the Environment variables or you update the Starting script section. Whichever way suits you better, both would do the job

Also read the below article if you need further help https://www.jetbrains.com/help/pycharm/python-console.html

like image 174
Tarun Lalwani Avatar answered Oct 02 '22 19:10

Tarun Lalwani