How to package Scrapy dependency to lambda?

Question

I am writing a python application which dependents on Scrapy module. It works fine locally but failed when I run it from aws lambda test console. My python project has a requirements.txt file with below dependency:

scrapy==1.6.0

I packaged all dependencies by following this link: https://docs.aws.amazon.com/lambda/latest/dg/lambda-python-how-to-create-deployment-package.html. And also, I put my source code *.py at the root level of in the zip file. My package script can be found https://github.com/zhaoyi0113/quote-datalake/blob/master/bin/deploy.sh.

It basically does two things, first run command pip install -r requirements.txt -t dist to download all dependencies to dist directory. second, copy app python source code to dist directory.

The deployment is done via terraform and below is the configuration file.

provider "aws" {
  profile    = "default"
  region     = "ap-southeast-2"
}

variable "runtime" {
  default = "python3.6"
}

data "archive_file" "zipit" {
    type        = "zip"
    source_dir  = "crawler/dist"
    output_path = "crawler/dist/deploy.zip"
}
resource "aws_lambda_function" "test_lambda" {
  filename      = "crawler/dist/deploy.zip"
  function_name = "quote-crawler"
  role          = "arn:aws:iam::773592622512:role/LambdaRole"
  handler       = "handler.handler"
  source_code_hash = "${data.archive_file.zipit.output_base64sha256}"
  runtime = "${var.runtime}"
}

It zip the directory and upload the file to lambda.

I found I get the runtime error in lambda Unable to import module 'handler': cannot import name 'etree' when there is a statement import scrapy. I didn't use etree in my code so I believe there is something used by scrapy.

My source code can be found at https://github.com/zhaoyi0113/quote-datalake/tree/master/crawler. There are only two simple python files.

It works fine if I run them locally. The error only appears in lambda. Is there a different way to package scrapy to lambda?

Joey Yi Zhao · Accepted Answer

Based on the communication with Tim, the issue is caused by incompatible library versions between local and lambda.

The easiest way to resolve this issue is to use the docker image lambci/lambda to build a package with the command:

$ docker run -v $(pwd):/outputs -it --rm lambci/lambda:build-python3.6 pip install scrapy -t /outputs/

Tim · Answer

You need to provide the entire dependency tree, scrapy also has a set of dependencies (and they may also have dependencies).

The easiest way to download all the required dependencies is to use pip

$ pip -t packages/ install scrapy

This will download scrapy and all its dependencies into the folder packages.

Scrapy has lxml and pyOpenSSL as dependencies that include compiled components. Unless they are statically compiled they will likely require that the c-libraries they require are also installed on the lambda VM.

From the lxml documentation it requires:

libxml2 version 2.9.2 or later.
libxslt version 1.1.27 or later. We recommend libxslt 1.1.28 or later.

Maybe try adding installation of these to your deploy script. You should be able to use (I'm making a guess at the package names) yum -y install libxml2 libxslt

Another good idea is to test your scripts on an Amazon Linux EC2 instance as this is close to the environment that Lambda executes in.

How to package Scrapy dependency to lambda?

Tags:

python

aws-lambda

scrapy

Joey Yi Zhao

2 Answers

Joey Yi Zhao

Tim

Recent Activity

Donate For Us

How to package Scrapy dependency to lambda?

Tags:

python

aws-lambda

scrapy

Joey Yi Zhao

2 Answers

Joey Yi Zhao

Tim

Related questions

Recent Activity

Donate For Us