Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does it take ages to install Pandas on Alpine Linux

People also ask

How long does pandas installation take?

There are several ways of going about installing Pandas on a computer. The methods listed in this post are fairly simple, and it shouldn't take you longer than five minutes to get Pandas set up on your machine.

Is Alpine Linux slow?

And if you're using Go that's reasonable advice. But if you're using Python, Alpine Linux will quite often: Make your builds much slower. Make your images bigger.

Why can I not install pandas?

One way you could be encountering this error is if you have multiple Python installations on your system and you don't have pandas installed in the Python installation you're currently using. In Linux/Mac you can run which python on your terminal and it will tell you which Python installation you're using.


Debian based images use only python pip to install packages with .whl format:

  Downloading pandas-0.22.0-cp36-cp36m-manylinux1_x86_64.whl (26.2MB)
  Downloading numpy-1.14.1-cp36-cp36m-manylinux1_x86_64.whl (12.2MB)

WHL format was developed as a quicker and more reliable method of installing Python software than re-building from source code every time. WHL files only have to be moved to the correct location on the target system to be installed, whereas a source distribution requires a build step before installation.

Wheel packages pandas and numpy are not supported in images based on Alpine platform. That's why when we install them using python pip during the building process, we always compile them from the source files in alpine:

  Downloading pandas-0.22.0.tar.gz (11.3MB)
  Downloading numpy-1.14.1.zip (4.9MB)

and we can see the following inside container during the image building:

/ # ps aux
PID   USER     TIME   COMMAND
    1 root       0:00 /bin/sh -c pip install pandas
    7 root       0:04 {pip} /usr/local/bin/python /usr/local/bin/pip install pandas
   21 root       0:07 /usr/local/bin/python -c import setuptools, tokenize;__file__='/tmp/pip-build-en29h0ak/pandas/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n
  496 root       0:00 sh
  660 root       0:00 /bin/sh -c gcc -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -DTHREAD_STACK_SIZE=0x100000 -fPIC -Ibuild/src.linux-x86_64-3.6/numpy/core/src/pri
  661 root       0:00 gcc -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -DTHREAD_STACK_SIZE=0x100000 -fPIC -Ibuild/src.linux-x86_64-3.6/numpy/core/src/private -Inump
  662 root       0:00 /usr/libexec/gcc/x86_64-alpine-linux-musl/6.4.0/cc1 -quiet -I build/src.linux-x86_64-3.6/numpy/core/src/private -I numpy/core/include -I build/src.linux-x86_64-3.6/numpy/core/includ
  663 root       0:00 ps aux

If we modify Dockerfile a little:

FROM python:3.6.4-alpine3.7
RUN apk add --no-cache g++ wget
RUN wget https://pypi.python.org/packages/da/c6/0936bc5814b429fddb5d6252566fe73a3e40372e6ceaf87de3dec1326f28/pandas-0.22.0-cp36-cp36m-manylinux1_x86_64.whl
RUN pip install pandas-0.22.0-cp36-cp36m-manylinux1_x86_64.whl

we get the following error:

Step 4/4 : RUN pip install pandas-0.22.0-cp36-cp36m-manylinux1_x86_64.whl
 ---> Running in 0faea63e2bda
pandas-0.22.0-cp36-cp36m-manylinux1_x86_64.whl is not a supported wheel on this platform.
The command '/bin/sh -c pip install pandas-0.22.0-cp36-cp36m-manylinux1_x86_64.whl' returned a non-zero code: 1

Unfortunately, the only way to install pandas on an Alpine image is to wait until build finishes.

Of course if you want to use the Alpine image with pandas in CI for example, the best way to do so is to compile it once, push it to any registry and use it as a base image for your needs.

EDIT: If you want to use the Alpine image with pandas you can pull my nickgryg/alpine-pandas docker image. It is a python image with pre-compiled pandas on the Alpine platform. It should save your time.


ANSWER: AS OF 3/9/2020, FOR PYTHON 3, IT STILL DOESN'T!

Here is a complete working Dockerfile:

FROM python:3.7-alpine
RUN echo "@testing http://dl-cdn.alpinelinux.org/alpine/edge/testing" >> /etc/apk/repositories
RUN apk add --update --no-cache py3-numpy py3-pandas@testing

The build is very sensitive to the exact python and alpine version numbers - getting these wrong seems to provoke Max Levy's error so:libpython3.7m.so.1.0 (missing) - but the above does now work for me.

My updated Dockerfile is available at https://gist.github.com/jtlz2/b0f4bc07ce2ff04bc193337f2327c13b


[Earlier Update:]

ANSWER: IT DOESN'T!

In any Alpine Dockerfile you can simply do*

RUN apk add py2-numpy@community py2-scipy@community py-pandas@edge

This is because numpy, scipy and now pandas are all available prebuilt on alpine:

https://pkgs.alpinelinux.org/packages?name=*numpy

https://pkgs.alpinelinux.org/packages?name=*scipy&branch=edge

https://pkgs.alpinelinux.org/packages?name=*pandas&branch=edge

One way to avoid rebuilding every time, or using a Docker layer, is to use a prebuilt, native Alpine Linux/.apk package, e.g.

https://github.com/sgerrand/alpine-pkg-py-pandas

https://github.com/nbgallery/apks

You can build these .apks once and use them wherever in your Dockerfile you like :)

This also saves you having to bake everything else into the Docker image before the fact - i.e. the flexibility to pre-build any Docker image you like.

PS I have put a Dockerfile stub at https://gist.github.com/jtlz2/b0f4bc07ce2ff04bc193337f2327c13b that shows roughly how to build the image. These include the important steps (*):

RUN echo "@community http://dl-cdn.alpinelinux.org/alpine/edge/community" >> /etc/apk/repositories
RUN apk update
RUN apk add --update --no-cache libgfortran

Real honest advice here, switch to Debian based image and then all your problems will be gone.

Alpine for python applications doesn't work well.

Here is an example of my dockerfile:

FROM python:3.7.6-buster

RUN pip install pandas==1.0.0
RUN pip install sklearn
RUN pip install Django==3.0.2
RUN pip install cx_Oracle==7.3.0
RUN pip install excel
RUN pip install djangorestframework==3.11.0

The python:3.7.6-buster is more appropriate in this case, in addition, you don't need any extra dependency in the OS.

Follow a usefull and recent article: https://pythonspeed.com/articles/alpine-docker-python/:

Don’t use Alpine Linux for Python images Unless you want massively slower build times, larger images, more work, and the potential for obscure bugs, you’ll want to avoid Alpine Linux as a base image. For some recommendations on what you should use, see my article on choosing a good base image.


ATTENTION
Look at the @jtlz2 answer with the latest update

OUTDATED

So, py3-pandas & py3-numpy packages moved to the testing alpine repository, so, you can download it by adding these lines in to the your Dockerfile:

RUN echo "http://dl-8.alpinelinux.org/alpine/edge/testing" >> /etc/apk/repositories \
  && apk update \
  && apk add py3-numpy py3-pandas

Hope it helps someone!

Alpine packages links:
- py3-pandas
- py3-numpy

Alpine repositories docks info.


Just going to bring some of these answers together in one answer and add a detail I think was missed. The reason certain python libraries, particularly optimized math and data libraries, take so long to build on alpine is because the pip wheels for these libraries include binaries precompiled from c/c++ and linked against gnu-libc (glibc), a common set of c standard libraries. Debian, Fedora, CentOS all (typically) use glibc, but alpine, in order to stay lightweight, uses musl-libc instead. c/c++ binaries build on a glibc system will not work on a system without glibc and the same goes for musl.

Pip looks first for a wheel with the correct binaries, if it can't find one, it tries to compile the binaries from the c/c++ source and links them against musl. In many cases, this won't even work unless you have the python headers from python3-dev or build tools like make.

Now the silver lining, as others have mentioned, there are apk packages with the proper binaries provided by the community, using these will save you the (sometimes lengthy) process of building the binaries.

You can, in fact, install from a pure python .whl on alpine, but, at the time of this writing, manylinux did not support binary distributions for alpine due to the musl/gnu issue.


This worked for me:

FROM python:3.8-alpine
RUN echo "@testing http://dl-cdn.alpinelinux.org/alpine/edge/testing" >> /etc/apk/repositories
RUN apk add --update --no-cache py3-numpy py3-pandas@testing
ENV PYTHONPATH=/usr/lib/python3.8/site-packages

COPY . /app
WORKDIR /app

RUN pip install -r requirements.txt

EXPOSE 5003 
ENTRYPOINT [ "python" ] 
CMD [ "app.py" ]

Most of the code here is from the answer of jtlz2 from this same thread and Faylixe from another thread.

Turns out the lighter version of pandas is found in the Alpine repository py3-numpy but it doesn't get installed in the same file path from where Python reads the imports by default. Therefore you need to add the ENV. Also be mindful about the alpine version.


In this case the alpine not be the best solution change alpine for slim:

FROM python:3.8.3-alpine

Change to that:

FROM python:3.8.3-slim

In my case it was resolved with this small change.