I am building a docker container using the following Dockerfile:
FROM ubuntu:14.04
RUN apt-get update
RUN apt-get install -y python python-dev python-pip
ADD . /app
RUN apt-get install -y python-scipy
RUN pip install -r /arrc/requirements.txt
EXPOSE 5000
WORKDIR /app
CMD python app.py
Everything goes well until I run the image and get the following error:
**********************************************************************
Resource u'tokenizers/punkt/english.pickle' not found. Please
use the NLTK Downloader to obtain the resource: >>>
nltk.download()
Searched in:
- '/root/nltk_data'
- '/usr/share/nltk_data'
- '/usr/local/share/nltk_data'
- '/usr/lib/nltk_data'
- '/usr/local/lib/nltk_data'
- u''
**********************************************************************
I have had this problem before and it is discussed here however I am not sure how to approach it using Docker. I have tried:
CMD python
CMD import nltk
CMD nltk.download()
as well as:
CMD python -m nltk.downloader -d /usr/share/nltk_data popular
But am still getting the error.
Install Numpy (optional): run sudo pip install -U numpy. Install NLTK: run sudo pip install -U nltk. Test installation: run python then type import nltk.
In your Dockerfile, try adding instead:
RUN python -m nltk.downloader punkt
This will run the command and install the requested files to //nltk_data/
The problem is most likely related to using CMD vs. RUN in the Dockerfile. Documentation for CMD:
The main purpose of a CMD is to provide defaults for an executing container.
which is used during docker run <image>
, not during build. So other CMD lines probably were overwritten by the last CMD python app.py
line.
well I tried all the methods suggested but nothing worked so I realized that nltk module searched in /root/nltk_data
step 1: i downloaded the punkt on my machine by using
python3
>>import nltk
>>nltk.download('punkt')
And the punkt was in /root/nltk_data/tokenizer
step 2: i copied tokenizer folder to my director and my directory looked something like this
.
|-app/
|-tokenizers/
|--punkt/
|---all those pkl files
|--punkt.zip
and step 3: then i modified the Dockerfile which copied that to my docker instance
COPY ./tokenizers /root/nltk_data/tokenizers
step 4: The new instance had punkt
I was facing same issue when I was creating docker image with ubuntu image and python3 for django application.
I resolved as shown below.
# start from an official image
FROM ubuntu:16.04
RUN apt-get update \
&& apt-get install -y python3-pip python3-dev \
&& apt-get install -y libmysqlclient-dev python3-virtualenv
# arbitrary location choice: you can change the directory
RUN mkdir -p /opt/services/djangoapp/src
WORKDIR /opt/services/djangoapp/src
# copy our project code
COPY . /opt/services/djangoapp/src
# install dependency for running service
RUN pip3 install -r requirements.txt
RUN python3 -m nltk.downloader punkt
RUN python3 -m nltk.downloader wordnet
# Setup supervisord
RUN mkdir -p /var/log/supervisor
COPY supervisord.conf /etc/supervisor/conf.d/supervisord.conf
# Start processes
CMD ["/usr/bin/supervisord"]
I got this to work for google cloud build by indicating a download destination within the container.
RUN [ "python3", "-c", "import nltk; nltk.download('punkt', download_dir='/usr/local/nltk_data')" ]
Full Dockerfile
FROM python:3.8.3
WORKDIR /app
ADD . /app
# install requirements
RUN pip3 install --upgrade pip
RUN pip3 install --no-cache-dir --compile -r requirements.txt
RUN [ "python3", "-c", "import nltk; nltk.download('punkt', download_dir='/usr/local/nltk_data')" ]
CMD exec uvicorn --host 0.0.0.0 --port $PORT main:app
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With