I am trying to use the LXML module within AWS Lambda and having no luck. I downloaded LXML using the following command:
pip install lxml -t folder
To download it to my lambda function deployment package. I zipped the contents of my lambda function up as I have done with all other lambda functions, and uploaded it to AWS Lambda.
However no matter what I try I get this error when I run the function:
Unable to import module 'handler': /var/task/lxml/etree.so: undefined symbol: PyFPE_jbuf
When I run it locally, I don't have an issues, it is simply when I run in on Lambda where this issue arises.
I have solved this using the serverless framework and its built-in Docker feature.
Requirement: You have an AWS profile in your .aws folder that can be accessed.
First, install the serverless framework as described here. You can then create a configuration file using the command serverless create --template aws-python3 --name my-lambda
. It will create a serverless.yml file and a handler.py with a simple "hello" function. You can check if that works with a sls deploy
. If that works, serverless is ready to be worked with.
Next, we'll need an additional plugin named "serverless-python-requirements" for bundling Python requirements. You can install it via sls plugin install --name serverless-python-requirements
.
This plugin is where all the magic happens that we need to solve the missing lxml package. In the custom->pythonRequirements section you simply have to add the dockerizePip: non-linux
property. Your serverless.yml file could look something like this:
service: producthunt-crawler
provider:
name: aws
runtime: python3.8
functions:
hello:
# some handler that imports lxml
handler: handler.hello
plugins:
- serverless-python-requirements
custom:
pythonRequirements:
fileName: requirements.txt
dockerizePip: non-linux
# Omits tests, __pycache__, *.pyc etc from dependencies
slim: true
This will run the bundling of python requirements inside a pre-configured docker container. After this, you can run sls deploy
to see the magic happen and then sls invoke -f my_function
to check that it works.
When you've used serverless to deploy and add the dockerizePip: non-linux
option later, make sure to clean up your already built requirements with sls requirements clean
. Otherwise, it just uses the already built stuff.
The lxml library is os dependent thus we need to have precompiled copy. Below are the steps.
Create a docker container.docker run -it lambci/lambda:build-python3.8 bash
Create a dir named 'lib'(anything you want) and Install lxml into it.mkdir lib
pip install lxml -t ./lib --no-deps
Open another cmd and rundocker ps
copy the containerid
Copy the files from container to host.mkdir /home/libraries/opt/python/lib/python3.8/site-packages/
docker cp <containerid>:/var/task/lib /home/libraries/opt/python/lib/python3.8/site-packages/
Now you have lxml copy of files compiled from amazonlinux box, If you like to have lxml as Lambda layer. Navigate to /home/libraries/opt
and zip the folder named python
.Name the zip as opt
. Now you can attach the zip in your lambda as layer.
If you want lxml library inside lambda. Navigate to /home/libraries/opt/python/lib/python3.8/site-packages/
and copy the lxml
folder in your lambda.
I faced the same issue.
The link posted by Raphaël Braud was helpful and so was this one: https://nervous.io/python/aws/lambda/2016/02/17/scipy-pandas-lambda/
Using the two links I was able to successfully import lxml and other required packages. Here are the steps I followed:
Run the following script to accumulate dependencies:
set -e -o pipefail
sudo yum -y upgrade
sudo yum -y install gcc python-devel libxml2-devel libxslt-devel
virtualenv ~/env && cd ~/env && source bin/activate
pip install lxml
for dir in lib64/python2.7/site-packages \
lib/python2.7/site-packages
do
if [ -d $dir ] ; then
pushd $dir; zip -r ~/deps.zip .; popd
fi
done
mkdir -p local/lib
cp /usr/lib64/ #list of required .so files
local/lib/
zip -r ~/deps.zip local/lib
Create handler and worker files as specified in the link. Sample file contents:
handler.py
import os
import subprocess
libdir = os.path.join(os.getcwd(), 'local', 'lib')
def handler(event, context):
command = 'LD_LIBRARY_PATH={} python worker.py '.format(libdir)
output = subprocess.check_output(command, shell=True)
print output
return
worker.py:
import lxml
def sample_function( input_string = None):
return "lxml import successful!"
if __name__ == "__main__":
result = sample_function()
print result
Here is how the structure of the zip file looks after the above steps:
deps
├── handler.py
├── worker.py
├── local
│ └── lib
│ ├── libanl.so
│ ├── libBrokenLocale.so
| ....
├── lxml
│ ├── builder.py
│ ├── builder.pyc
| ....
├── <other python packages>
Hope this helps!
Extending on these answers, I found the following to work well.
The punchline here is having python compile lxml with static libs, and installing in the current directory rather than site-packages.
It also means you can write your python code as usual, without need for a distinct worker.py or fiddling with LD_LIBRARY_PATH
sudo yum groupinstall 'Development Tools'
sudo yum -y install python36-devel python36-pip
sudo ln -s /usr/bin/pip-3.6 /usr/bin/pip3
mkdir lambda && cd lambda
STATIC_DEPS=true pip3 install -t . lxml
zip -r ~/deps.zip *
to take it to the next level, use serverless and docker to handle everything. here is a blog post demonstrating this: https://serverless.com/blog/serverless-python-packaging/
Expanding a bit on Mask's answer. In the case of installing lxml in particular, the libxslt and libxml2 libraries are already installed on the AMI that executes the AWS lambda. Therefore it is no need to start a subprocess with a different LD_LIBRARY_PATH as in that answer, it is however necessary to run pip install lxml on an AMI image (it might be possible to cross-compile as well but I don't know how).
Launch an ec2 machine with Amazon Linux ami
Run the following script to accumulate dependencies:
set -e -o pipefail
sudo yum -y upgrade
sudo yum -y install gcc python-devel libxml2-devel libxslt-devel
virtualenv ~/env && cd ~/env && source bin/activate
pip install lxml
for dir in lib64/python2.7/site-packages \
lib/python2.7/site-packages
do
if [ -d $dir ] ; then
pushd $dir; zip -r ~/deps.zip .; popd
fi
done
Note that the last steps from Marks answer is left out. You can use lxml straight from the python file that contains the handler method.
AWS Lambda use a special version of Linux (as far as I can see).
Using "pip install a_package -t folder" is the good thing to do usually as it will help to package your dependencies within the archive that will be sent to Lambda, but the libraries, and especially the binary libraries have to be compatible with the version of OS and Python on lambda.
You could use the xml module included in Python : https://docs.python.org/2/library/xml.etree.elementtree.html
If you really need lxml, this link gives some tricks on how to compile shared libraries for Lambda : http://www.perrygeo.com/running-python-with-compiled-code-on-aws-lambda.html
I was able to get this working by following the readme on this page:
python3.8
with the version of python you are using for your lambda function, and lxml
with the version of lxml you want to use)
$ docker run -v $(pwd):/outputs -it lambci/lambda:build-python3.8 \
pip install lxml -t /outputs/
lxml
in your working directory, and possibly some other folders which you can ignore. Move the lxml
folder to the same directory as the .py
file you are using as your lambda handler..py
file with the lxml folder, as well as any packages if you are using a virtualenv. I had a virtualenv and lxml already existed in my site-packages folder, so I had to delete it first. Here are the commands I ran (note that my virtualenv v-env folder was in the same directory as my .py
file):
FUNCTION_NAME="name_of_your_python_file"
cd v-env/lib/python3.8/site-packages &&
rm -rf lxml &&
rm -rf lxml-4.5.1.dist-info &&
zip -r9 ${OLDPWD}/${FUNCTION_NAME}.zip . &&
cd ${OLDPWD} &&
zip -g ${FUNCTION_NAME}.zip ${FUNCTION_NAME}.py &&
zip -r9 ${FUNCTION_NAME}.zip lxml
FUNCTION_NAME="name_of_your_python_file"
zip -g ${FUNCTION_NAME}.zip ${FUNCTION_NAME}.py &&
zip -r9 ${FUNCTION_NAME}.zip lxml
More on creating a .zip file for lambda with a virtualenv here
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With