Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Reducing the size of scipy and numpy for aws lambda deployment

I am trying to deploy a python application on aws lambda. It has several large python dependencies, the largest being scipy and numpy. The result is that my application is significantly larger than the allowed 250MB.

While trying to find a way to reduce the size, I came across the approach detailed here:

https://github.com/szelenka/shrink-linalg

In essence, when installing using pip, during the scipy & numpy cython compilation, flags can be passed to the c compiler that will leave out the debugging information in the compiled c binaries. The result is that scipy and numpy are reduced to about 50% of the original size. I was able to run this locally (ubuntu 16.04) , and created the binaries without issue. The command used was:

CFLAGS="-g0 -I/usr/include:/usr/local/include -L/usr/lib:/usr/local/lib" pip install numpy scipy --compile --no-cache-dir --global-option=build_ext --global-option="-j 4"

The problem is that in order to run on aws lambda, the binaries must be compiled in a similar environment to the one lambda runs on. An image of the environment can be found here:

https://docs.aws.amazon.com/lambda/latest/dg/current-supported-versions.html

After loading the image on an ec2 instance, I tried to run the same pip installation after installing a few dependencies

sudo yum install python36 python3-devel blas-devel atlas atlas-devel lapack-devel atlas-sse3-devel gcc gcc-64 gcc-gfortran gcc64-gfortran libgfortran, gcc-c++ openblas-devel python36-virtualenv

The numpy is compiling fine, but scipy is not. The cython is not causing any problems, but the fortran compilation is. I am getting the following error:

error: Command "/usr/bin/gfortran -Wall -g -Wall -g -shared build/temp.linux-x86_64-3.6/build/src.linux-x86_64-3.6/scipy/integrate/_test_odeint_bandedmodule.o build/temp.linux-x86_64-3.6/build/src.linux-x86_64-3.6/build/src.linux-x86_64-3.6/scipy/integrate/fortranobject.o build/temp.linux-x86_64-3.6/scipy/integrate/tests/banded5x5.o build/temp.linux-x86_64-3.6/build/src.linux-x86_64-3.6/scipy/integrate/_test_odeint_banded-f2pywrappers.o -L/usr/lib64/atlas -L/usr/lib/gcc/x86_64-amazon-linux/6.4.1 -L/usr/lib/gcc/x86_64-amazon-linux/6.4.1 -L/usr/lib64 -Lbuild/temp.linux-x86_64-3.6 -llsoda -lmach -llapack -lptf77blas -lptcblas -latlas -lptf77blas -lptcblas -lpython3.6m -lgfortran -o build/lib.linux-x86_64-3.6/scipy/integrate/_test_odeint_banded.cpython-36m-x86_64-linux-gnu.so -Wl,--version-script=build/temp.linux-x86_64-3.6/link-version-scipy.integrate._test_odeint_banded.map" failed with exit status 1

I have tried re-installing gfortran as well as the whole gcc collection, but without any luck. Unfortunately, I have very limited experience with fortran compilers. If anyone has any ideas, or has a compiled version of the c binaries, I'd be quite grateful.

like image 871
joek575 Avatar asked Nov 13 '18 05:11

joek575


People also ask

How do I reduce the size of a Lambda deployment package?

One of the simpler workarounds is to move the large file into Cloud Storage like S3. During the lambda runtime, download the file and continue the processing as required. In this example, we can move the mock_data. csv to S3, and update the handler function to download the file from S3 before everything else.

How big should AWS Lambda functions be?

There is a hard limit of 6mb when it comes to AWS Lambda payload size. This means we cannot send more than 6mb of data to AWS Lambda in a single request. Developers will typically run into this limit if their application was using AWS Lambda as the middle man between their client and their AWS S3 asset storage.


1 Answers

Using the serverless-python-requirements package on Serverless helped me streamline this whole process and reduce the package size as well. Would definitely recommend checking it out.

This is the guide that I followed

Serverless python-requirements plugin

Make sure to leave the strip flag to false to avoid stripping binaries which leads to the problem "ELF load command address/offset not properly aligned",

This is what my final serverless.yml came out to be which gave me the results I wanted to package sklearn + cv2 as a layer:

custom:
  pythonRequirements:
    dockerizePip: true
    useDownloadCache: true
    useStaticCache: false
    slim: true
    strip: false
    layer:
      name: ${self:provider.stage}-cv2-sklearn
      description: Python requirements lambda layer
      compatibleRuntimes:
        - python3.8
      allowedAccounts:
        - '*'
like image 89
Amir Mousavi Avatar answered Sep 23 '22 01:09

Amir Mousavi