Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

AWS Lambda not importing LXML

I am trying to use the LXML module within AWS Lambda and having no luck. I downloaded LXML using the following command:

pip install lxml -t folder

To download it to my lambda function deployment package. I zipped the contents of my lambda function up as I have done with all other lambda functions, and uploaded it to AWS Lambda.

However no matter what I try I get this error when I run the function:

Unable to import module 'handler': /var/task/lxml/etree.so: undefined symbol: PyFPE_jbuf

When I run it locally, I don't have an issues, it is simply when I run in on Lambda where this issue arises.

like image 723
user3024827 Avatar asked Apr 04 '16 07:04

user3024827


Video Answer


7 Answers

I have solved this using the serverless framework and its built-in Docker feature.

Requirement: You have an AWS profile in your .aws folder that can be accessed.

First, install the serverless framework as described here. You can then create a configuration file using the command serverless create --template aws-python3 --name my-lambda. It will create a serverless.yml file and a handler.py with a simple "hello" function. You can check if that works with a sls deploy. If that works, serverless is ready to be worked with.

Next, we'll need an additional plugin named "serverless-python-requirements" for bundling Python requirements. You can install it via sls plugin install --name serverless-python-requirements.

This plugin is where all the magic happens that we need to solve the missing lxml package. In the custom->pythonRequirements section you simply have to add the dockerizePip: non-linux property. Your serverless.yml file could look something like this:

service: producthunt-crawler

provider:
  name: aws
  runtime: python3.8

functions:
  hello:
    # some handler that imports lxml
    handler: handler.hello

plugins:
  - serverless-python-requirements

custom:
  pythonRequirements:
    fileName: requirements.txt
    dockerizePip: non-linux

    # Omits tests, __pycache__, *.pyc etc from dependencies
    slim: true

This will run the bundling of python requirements inside a pre-configured docker container. After this, you can run sls deploy to see the magic happen and then sls invoke -f my_function to check that it works.

When you've used serverless to deploy and add the dockerizePip: non-linux option later, make sure to clean up your already built requirements with sls requirements clean. Otherwise, it just uses the already built stuff.

like image 123
akohout Avatar answered Oct 19 '22 20:10

akohout


The lxml library is os dependent thus we need to have precompiled copy. Below are the steps.

  1. Create a docker container.
    docker run -it lambci/lambda:build-python3.8 bash

  2. Create a dir named 'lib'(anything you want) and Install lxml into it.
    mkdir lib
    pip install lxml -t ./lib --no-deps

  3. Open another cmd and run
    docker ps

  4. copy the containerid

  5. Copy the files from container to host.
    mkdir /home/libraries/opt/python/lib/python3.8/site-packages/
    docker cp <containerid>:/var/task/lib /home/libraries/opt/python/lib/python3.8/site-packages/

  6. Now you have lxml copy of files compiled from amazonlinux box, If you like to have lxml as Lambda layer. Navigate to /home/libraries/opt and zip the folder named python.Name the zip as opt. Now you can attach the zip in your lambda as layer.

  7. If you want lxml library inside lambda. Navigate to /home/libraries/opt/python/lib/python3.8/site-packages/ and copy the lxml folder in your lambda.

like image 28
Chandan Kumar Avatar answered Oct 14 '22 15:10

Chandan Kumar


I faced the same issue.

The link posted by Raphaël Braud was helpful and so was this one: https://nervous.io/python/aws/lambda/2016/02/17/scipy-pandas-lambda/

Using the two links I was able to successfully import lxml and other required packages. Here are the steps I followed:

  • Launch an ec2 machine with Amazon Linux ami
  • Run the following script to accumulate dependencies:

    set -e -o pipefail
    sudo yum -y upgrade
    sudo yum -y install gcc python-devel libxml2-devel libxslt-devel
    
    virtualenv ~/env && cd ~/env && source bin/activate
    pip install lxml
    for dir in lib64/python2.7/site-packages \
         lib/python2.7/site-packages
    do
    if [ -d $dir ] ; then
       pushd $dir; zip -r ~/deps.zip .; popd
    fi
    done  
    mkdir -p local/lib
    cp /usr/lib64/ #list of required .so files
    local/lib/
    zip -r ~/deps.zip local/lib
    
  • Create handler and worker files as specified in the link. Sample file contents:

handler.py

import os
import subprocess


libdir = os.path.join(os.getcwd(), 'local', 'lib')

def handler(event, context):
    command = 'LD_LIBRARY_PATH={} python worker.py '.format(libdir)
    output = subprocess.check_output(command, shell=True)

    print output

    return

worker.py:

import lxml

def sample_function( input_string = None):
    return "lxml import successful!"

if __name__ == "__main__":
    result = sample_function()
    print result
  • Add handler and worker to zip file.

Here is how the structure of the zip file looks after the above steps:

deps 
├── handler.py
├── worker.py 
├── local
│   └── lib
│       ├── libanl.so
│       ├── libBrokenLocale.so
|       ....
├── lxml
│   ├── builder.py
│   ├── builder.pyc
|       ....
├── <other python packages>
  • Make sure you specify the correct handler name while creating the lambda function. In the above example, it would be- "handler.handler"

Hope this helps!

like image 9
Mask Avatar answered Oct 19 '22 20:10

Mask


Extending on these answers, I found the following to work well.

The punchline here is having python compile lxml with static libs, and installing in the current directory rather than site-packages.

It also means you can write your python code as usual, without need for a distinct worker.py or fiddling with LD_LIBRARY_PATH

sudo yum groupinstall 'Development Tools'
sudo yum -y install python36-devel python36-pip
sudo ln -s /usr/bin/pip-3.6 /usr/bin/pip3
mkdir lambda && cd lambda
STATIC_DEPS=true pip3 install -t . lxml
zip -r ~/deps.zip *

to take it to the next level, use serverless and docker to handle everything. here is a blog post demonstrating this: https://serverless.com/blog/serverless-python-packaging/

like image 5
Foofy Avatar answered Oct 19 '22 20:10

Foofy


Expanding a bit on Mask's answer. In the case of installing lxml in particular, the libxslt and libxml2 libraries are already installed on the AMI that executes the AWS lambda. Therefore it is no need to start a subprocess with a different LD_LIBRARY_PATH as in that answer, it is however necessary to run pip install lxml on an AMI image (it might be possible to cross-compile as well but I don't know how).

Launch an ec2 machine with Amazon Linux ami
Run the following script to accumulate dependencies:
set -e -o pipefail
sudo yum -y upgrade
sudo yum -y install gcc python-devel libxml2-devel libxslt-devel

virtualenv ~/env && cd ~/env && source bin/activate
pip install lxml
for dir in lib64/python2.7/site-packages \
    lib/python2.7/site-packages
do
    if [ -d $dir ] ; then
        pushd $dir; zip -r ~/deps.zip .; popd
    fi
done 

Note that the last steps from Marks answer is left out. You can use lxml straight from the python file that contains the handler method.

like image 3
Hans Peter Hagblom Avatar answered Oct 19 '22 20:10

Hans Peter Hagblom


AWS Lambda use a special version of Linux (as far as I can see).

Using "pip install a_package -t folder" is the good thing to do usually as it will help to package your dependencies within the archive that will be sent to Lambda, but the libraries, and especially the binary libraries have to be compatible with the version of OS and Python on lambda.

You could use the xml module included in Python : https://docs.python.org/2/library/xml.etree.elementtree.html

If you really need lxml, this link gives some tricks on how to compile shared libraries for Lambda : http://www.perrygeo.com/running-python-with-compiled-code-on-aws-lambda.html

like image 2
Raphaël Braud Avatar answered Oct 19 '22 18:10

Raphaël Braud


I was able to get this working by following the readme on this page:

  1. With docker installed, run this command (replacing python3.8 with the version of python you are using for your lambda function, and lxml with the version of lxml you want to use)
    $ docker run -v $(pwd):/outputs -it lambci/lambda:build-python3.8 \
          pip install lxml -t /outputs/
    
  2. This will create a folder called lxml in your working directory, and possibly some other folders which you can ignore. Move the lxml folder to the same directory as the .py file you are using as your lambda handler.
  3. Zip up the .py file with the lxml folder, as well as any packages if you are using a virtualenv. I had a virtualenv and lxml already existed in my site-packages folder, so I had to delete it first. Here are the commands I ran (note that my virtualenv v-env folder was in the same directory as my .py file):
    FUNCTION_NAME="name_of_your_python_file"
    cd v-env/lib/python3.8/site-packages &&
    rm -rf lxml &&
    rm -rf lxml-4.5.1.dist-info &&
    zip -r9 ${OLDPWD}/${FUNCTION_NAME}.zip . &&
    cd ${OLDPWD} &&
    zip -g ${FUNCTION_NAME}.zip ${FUNCTION_NAME}.py && 
    zip -r9 ${FUNCTION_NAME}.zip lxml
    
  4. If you don't have a virtualenv or any other dependencies, you can just run
    FUNCTION_NAME="name_of_your_python_file"
    zip -g ${FUNCTION_NAME}.zip ${FUNCTION_NAME}.py && 
    zip -r9 ${FUNCTION_NAME}.zip lxml
    
  5. Upload ${FUNCTION_NAME}.zip to your lambda function and use as normal.

More on creating a .zip file for lambda with a virtualenv here

like image 2
Utkarsh Dalal Avatar answered Oct 19 '22 20:10

Utkarsh Dalal