Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas & AWS Lambda

Does anyone have a fully compiled version of pandas that is compatible with AWS Lambda?

After searching around for a few hours, I cannot seem to find what I'm looking for and the documentation on this subject is non-existent.

I need access to the package in a lambda function however I have been unsuccessful at getting the package to compile properly for usage in a Lambda function.

In lieu of the compilation can anyone provide reproducible steps to create the binaries?

Unfortunately I have not been able to successfully reproduce any of the guides on the subjects as they mostly combine pandas with scipy which I don't need and adds an extra layer of burden.

like image 218
Moe Avatar asked Mar 17 '16 08:03

Moe


People also ask

What is the use of pandas?

Pandas is mainly used for data analysis and associated manipulation of tabular data in Dataframes. Pandas allows importing data from various file formats such as comma-separated values, JSON, Parquet, SQL database tables or queries, and Microsoft Excel.

Why is it called pandas?

Pandas stands for “Python Data Analysis Library ”. According to the Wikipedia page on Pandas, “the name is derived from the term “panel data”, an econometrics term for multidimensional structured data sets.” But I think it's just a cute name to a super-useful Python library!

What is a pandas in Python?

Pandas is an open source Python package that is most widely used for data science/data analysis and machine learning tasks. It is built on top of another package named Numpy, which provides support for multi-dimensional arrays.

What is difference between Panda and pandas?

Panda belongs to "Media Transcoding" category of the tech stack, while Pandas can be primarily classified under "Data Science Tools". Some of the features offered by Panda are: Unlimited encoding- When we say unlimited we mean unlimited.


2 Answers

I believe you should be able to use the recent pandas version (or likely, the one on your machine). You can create a lambda package with pandas by yourself like this,

  1. First find where the pandas package is installed on your machine i.e. Open a python terminal and type

    import pandas pandas.__file__ 

    That should print something like '/usr/local/lib/python3.4/site-packages/pandas/__init__.py'

  2. Now copy the pandas folder from that location (in this case '/usr/local/lib/python3.4/site-packages/pandas) and place it in your repository.
  3. Package your Lambda code with pandas like this:

    zip -r9 my_lambda.zip pandas/ zip -9 my_lambda.zip my_lambda_function.py 

You can also deploy your code to S3 and make your Lambda use the code from S3.

aws s3 cp  my_lambda.zip s3://dev-code//projectx/lambda_packages/ 

Here's the repo that will get you started

like image 113
Chenna V Avatar answered Oct 30 '22 11:10

Chenna V


After some tinkering around and lot's of googling I was able to make everything work and setup a repo that can just be cloned in the future.

Key takeaways:

  1. All static packages have to be compiled on an ec2 amazon Linux instance
  2. The python code needs to load the libraries in the lib/ folder before executing.

Github repo: https://github.com/moesy/AWS-Lambda-ML-Microservice-Skeleton

like image 42
Moe Avatar answered Oct 30 '22 09:10

Moe