Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Install pandas in a Dockerfile

I am trying to create a Docker image. The Dockerfile is the following:

# Use the official Python 3.6.5 image
FROM python:3.6.5-alpine3.7

# Set the working directory to /app
WORKDIR /app

# Get the 
COPY requirements.txt /app
RUN pip3 install --no-cache-dir -r requirements.txt

# Configuring access to Jupyter
RUN mkdir /notebooks
RUN jupyter notebook --no-browser --ip 0.0.0.0 --port 8888 /notebooks

The requirements.txt file is:

jupyter
numpy==1.14.3
pandas==0.23.0rc2
scipy==1.0.1
scikit-learn==0.19.1
pillow==5.1.1
matplotlib==2.2.2
seaborn==0.8.1

Running the command docker build -t standard . gives me an error when docker it trying to install pandas. The error is the following:

Collecting pandas==0.23.0rc2 (from -r requirements.txt (line 3))
  Downloading https://files.pythonhosted.org/packages/46/5c/a883712dad8484ef907a2f42992b122acf2bcecbb5c2aa751d1033908502/pandas-0.23.0rc2.tar.gz (12.5MB)
    Complete output from command python setup.py egg_info:
    /bin/sh: svnversion: not found
    /bin/sh: svnversion: not found
    non-existing path in 'numpy/distutils': 'site.cfg'
    Could not locate executable gfortran
    ... (loads of other stuff)
    Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-install-xb6f6a5o/pandas/
The command '/bin/sh -c pip3 install --no-cache-dir -r requirements.txt' returned a non-zero code: 1

When I try to install a lower version of pandas==0.22.0, I get this error:

Step 5/7 : RUN pip3 install --no-cache-dir -r requirements.txt
 ---> Running in 5810ea896689
Collecting jupyter (from -r requirements.txt (line 1))
  Downloading https://files.pythonhosted.org/packages/83/df/0f5dd132200728a86190397e1ea87cd76244e42d39ec5e88efd25b2abd7e/jupyter-1.0.0-py2.py3-none-any.whl
Collecting numpy==1.14.3 (from -r requirements.txt (line 2))
  Downloading https://files.pythonhosted.org/packages/b0/2b/497c2bb7c660b2606d4a96e2035e92554429e139c6c71cdff67af66b58d2/numpy-1.14.3.zip (4.9MB)
Collecting pandas==0.22.0 (from -r requirements.txt (line 3))
  Downloading https://files.pythonhosted.org/packages/08/01/803834bc8a4e708aedebb133095a88a4dad9f45bbaf5ad777d2bea543c7e/pandas-0.22.0.tar.gz (11.3MB)
  Could not find a version that satisfies the requirement Cython (from versions: )
No matching distribution found for Cython
The command '/bin/sh -c pip3 install --no-cache-dir -r requirements.txt' returned a non-zero code: 1

I also tried to install Cyphon and setuptools before pandas, but it gave the same No matching distribution found for Cython error at the pip3 install pandas line.

How could I get pandas installed.

like image 655
ccasimiro9444 Avatar asked May 05 '18 14:05

ccasimiro9444


4 Answers

I realize this question has been answered, but I have recently had a similar issue with numpy and pandas dependancies with a dockerized project. That being said, I hope that this will be of benefit to someone in the future.

My solution:

As pointed out by Aviv Sela, Alpine does not contain build tools by default and will need to be added though the Dockerfile. Thus see below my Dockerfile with the build packages required for numpy and pandas for be successfully installed on Alpine for the container.

FROM python:3.6-alpine3.7

RUN apk add --no-cache --update \
    python3 python3-dev gcc \
    gfortran musl-dev g++ \
    libffi-dev openssl-dev \
    libxml2 libxml2-dev \
    libxslt libxslt-dev \
    libjpeg-turbo-dev zlib-dev

RUN pip install --upgrade pip

ADD requirements.txt .
RUN pip install -r requirements.txt

The requirements.txt

numpy==1.17.1
pandas==0.25.1

EDIT:

Add the following (code snippet below) to the Dockerfile, before the upgrade pip RUN command. It is critical to the successful installation of pandas as pointed out by Bishwas Mishra in a comment.

RUN pip install --upgrade cython
like image 197
Kevin Smith Avatar answered Oct 17 '22 05:10

Kevin Smith


Alpine don't contain build tools by default. Install build tool and create symbolic link for locale:

$ apk add --update curl gcc g++
$ ln -s /usr/include/locale.h /usr/include/xlocale.h
$ pip install numpy

Based on https://wired-world.com/?p=100

like image 24
Aviv Sela Avatar answered Oct 17 '22 06:10

Aviv Sela


Using a new version of python that is not yet supported with pandas will result in problems.

I found it does not work with a development version of Python:

FROM python:3.9.0a6-buster


RUN apt-get update && \
    apt-get -y install python3-pandas

COPY requirements.txt ./ 
RUN pip3 install --no-cache-dir -r 

requirements.txt:

numpy==1.18
pandas

I found it DOES work with an officially released version of Python:

FROM python:3.8-buster
like image 4
jersey bean Avatar answered Oct 17 '22 05:10

jersey bean


You're probably going to be better off building from a pandas image instead of base python. This will make iteration must faster and easier, because you won't ever have to reinstall pandas. I like amancevince/pandas ( https://hub.docker.com/r/amancevice/pandas/tags ). There are Alpine and Debian images available for every pandas tag, although I think they may all be python 3.7 now.

like image 2
Rebeku Avatar answered Oct 17 '22 06:10

Rebeku