I am trying to deploy a python application on aws lambda. It has several large python dependencies, the largest being scipy and numpy. The result is that my application is significantly larger than the allowed 250MB.
While trying to find a way to reduce the size, I came across the approach detailed here:
https://github.com/szelenka/shrink-linalg
In essence, when installing using pip, during the scipy & numpy cython compilation, flags can be passed to the c compiler that will leave out the debugging information in the compiled c binaries. The result is that scipy and numpy are reduced to about 50% of the original size. I was able to run this locally (ubuntu 16.04) , and created the binaries without issue. The command used was:
CFLAGS="-g0 -I/usr/include:/usr/local/include -L/usr/lib:/usr/local/lib" pip install numpy scipy --compile --no-cache-dir --global-option=build_ext --global-option="-j 4"
The problem is that in order to run on aws lambda, the binaries must be compiled in a similar environment to the one lambda runs on. An image of the environment can be found here:
https://docs.aws.amazon.com/lambda/latest/dg/current-supported-versions.html
After loading the image on an ec2 instance, I tried to run the same pip installation after installing a few dependencies
sudo yum install python36 python3-devel blas-devel atlas atlas-devel lapack-devel atlas-sse3-devel gcc gcc-64 gcc-gfortran gcc64-gfortran libgfortran, gcc-c++ openblas-devel python36-virtualenv
The numpy is compiling fine, but scipy is not. The cython is not causing any problems, but the fortran compilation is. I am getting the following error:
error: Command "/usr/bin/gfortran -Wall -g -Wall -g -shared build/temp.linux-x86_64-3.6/build/src.linux-x86_64-3.6/scipy/integrate/_test_odeint_bandedmodule.o build/temp.linux-x86_64-3.6/build/src.linux-x86_64-3.6/build/src.linux-x86_64-3.6/scipy/integrate/fortranobject.o build/temp.linux-x86_64-3.6/scipy/integrate/tests/banded5x5.o build/temp.linux-x86_64-3.6/build/src.linux-x86_64-3.6/scipy/integrate/_test_odeint_banded-f2pywrappers.o -L/usr/lib64/atlas -L/usr/lib/gcc/x86_64-amazon-linux/6.4.1 -L/usr/lib/gcc/x86_64-amazon-linux/6.4.1 -L/usr/lib64 -Lbuild/temp.linux-x86_64-3.6 -llsoda -lmach -llapack -lptf77blas -lptcblas -latlas -lptf77blas -lptcblas -lpython3.6m -lgfortran -o build/lib.linux-x86_64-3.6/scipy/integrate/_test_odeint_banded.cpython-36m-x86_64-linux-gnu.so -Wl,--version-script=build/temp.linux-x86_64-3.6/link-version-scipy.integrate._test_odeint_banded.map" failed with exit status 1
I have tried re-installing gfortran as well as the whole gcc collection, but without any luck. Unfortunately, I have very limited experience with fortran compilers. If anyone has any ideas, or has a compiled version of the c binaries, I'd be quite grateful.
One of the simpler workarounds is to move the large file into Cloud Storage like S3. During the lambda runtime, download the file and continue the processing as required. In this example, we can move the mock_data. csv to S3, and update the handler function to download the file from S3 before everything else.
There is a hard limit of 6mb when it comes to AWS Lambda payload size. This means we cannot send more than 6mb of data to AWS Lambda in a single request. Developers will typically run into this limit if their application was using AWS Lambda as the middle man between their client and their AWS S3 asset storage.
Using the serverless-python-requirements
package on Serverless helped me streamline this whole process and reduce the package size as well. Would definitely recommend checking it out.
This is the guide that I followed
Serverless python-requirements plugin
Make sure to leave the strip
flag to false
to avoid stripping binaries which leads to the problem "ELF load command address/offset not properly aligned",
This is what my final serverless.yml
came out to be which gave me the results I wanted to package sklearn + cv2 as a layer:
custom:
pythonRequirements:
dockerizePip: true
useDownloadCache: true
useStaticCache: false
slim: true
strip: false
layer:
name: ${self:provider.stage}-cv2-sklearn
description: Python requirements lambda layer
compatibleRuntimes:
- python3.8
allowedAccounts:
- '*'
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With