Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to bundle Python for AWS Lambda

I have a project I'd like to run on AWS Lambda but it is exceeding the 50MB zipped limit. Right now it is at 128MB zipped and the project folder with the virtual environment sits at 623MB and includes (top users of space):

  • scipy (~187MB)
  • pandas (~108MB)
  • numpy (~74.4MB)
  • lambda_packages (~71.4MB)

Without the virtualenv the project is <2MB. The requirements.txt is:

click==6.7
cycler==0.10.0
ecdsa==0.13
Flask==0.12.2
Flask-Cors==3.0.3
future==0.16.0
itsdangerous==0.24
Jinja2==2.10
MarkupSafe==1.0
matplotlib==2.1.2
mpmath==1.0.0
numericalunits==1.19
numpy==1.14.0
pandas==0.22.0
pycryptodome==3.4.7
pyparsing==2.2.0
python-dateutil==2.6.1
python-dotenv==0.7.1
python-jose==2.0.2
pytz==2017.3
scipy==1.0.0
six==1.11.0
sympy==1.1.1
Werkzeug==0.14.1
xlrd==1.1.0

I deploy using Zappa, so my understanding of the whole infrastructure is limited. My understanding is that some (very few) of the libraries do not get uploaded so for e.g. numpy, that part does not get uploaded and Amazon's version gets used that is already available in that environment.

I propose the following workflow (without using S3 buckets for slim_handler):

  1. delete all the files that match "test_*.py" in all packages
  2. manually tree shake scipy as I only use scipy.minimize, by deleting most of it and re-running my tests
  3. minify all the code and obfuscate using pyminifier
  4. zappa deploy

Or:

  1. run compileall to get .pyc files
  2. delete all *.py files and let zappa upload .pyc files instead
  3. zappa deploy

I've had issues with slim_handler: true, either my connection drops and the upload fails or some other error occurs and at ~25% of the upload to S3 I get Could not connect to the endpoint URL. For the purposes of this question, I'd like to get the dependencies down to manageable levels.

Nevertheless, over half a gig of dependencies with the main app being less than 2MB has to be some sort of record.

My questions are:

  1. What is the unzipped limit for AWS? Is it 250MB or 500MB?
  2. Am I on the right track with the above method for reducing package sizes?
  3. Is it possible to go a step further and use .pyz files?
  4. Are there any standard utilities out there that help with the above?
  5. Is there no tree shaking library for python?
like image 395
dim_voly Avatar asked Jan 27 '18 20:01

dim_voly


People also ask

How do I import a Python module in AWS Lambda?

Steps to add python packages in AWS lambda layersStep 1: Go to the AWS management console. Step 2: Click on create function. Step 5: Now try importing the requests module in your lambda function. So, create an event named “myevent” by clicking the down arrow on the test button.

Can we do multithreading in Lambda?

Using multithreading in AWS Lambda can speed up your Lambda execution and reduce cost as Lambda charges in 100 ms unit.


1 Answers

  1. The limit in AWS is for unpacked 250MB of code (as seen here https://hackernoon.com/exploring-the-aws-lambda-deployment-limits-9a8384b0bec3)
  2. I would suggest going for second method and compile everything. I think you should also consider using serverless framework. It does not force you to create virtualenv which is very heavy.

I've seen that all your packages can be compressed up to 83MB (just the packages).

My workaround would be:

  1. use serverless framework (consider moving from flask directly to API Gateway)
  2. install your packages locally on the same folder using:

    pip install -r requirements.txt -t .
    
  3. try your method of compiling to .pyc files, and remove others.

  4. Deploy:

    sis deploy
    

Hope it helps.

like image 168
Ran Ribenzaft Avatar answered Oct 01 '22 09:10

Ran Ribenzaft