SpaCy model won't load in AWS Lambda

Tags:

Has anyone gotten SpaCy 2.0 to work in AWS Lambda? I have everything zipped and packaged correctly, since I can get a generic string to return from my lambda function if I test it. But when I do the simple function below to test, it stalls for about 10 seconds and then returns empty, and I don't get any error messages. I did set my Lambda timeout at 60 seconds so that isn't the problem.

Click to copy

import spacy

nlp = spacy.load('en_core_web_sm') #model package included

def lambda_handler(event, context):
    doc = nlp(u'They are')
    msg = doc[0].lemma_
    return msg

When I load the model package without using it, it also returns empty, but if I comment it out it sends me the string as expected, so it has to be something about loading the model.

Click to copy

import spacy

nlp = spacy.load('en_core_web_sm') #model package included

def lambda_handler(event, context):
    msg = 'message returned'
    return msg

289

asked Dec 19 '17 02:12

pynewb

3 Answers

To optimize model load you have to store it on S3, and download it using your own script to tmp folder in lambda and then load it into spacy from it.

It will take 5 seconds to download it from S3 and run. The good optimization here is to keep model on warm container and check if it was already downloaded. On warm container code takes 0.8 seconds to run.

Here is the link to the code and package with example: https://github.com/ryfeus/lambda-packs/blob/master/Spacy/source2.7/index.py

Click to copy

import spacy
import boto3
import os


def download_dir(client, resource, dist, local='/tmp', bucket='s3bucket'):
    paginator = client.get_paginator('list_objects')
    for result in paginator.paginate(Bucket=bucket, Delimiter='/', Prefix=dist):
        if result.get('CommonPrefixes') is not None:
            for subdir in result.get('CommonPrefixes'):
                download_dir(client, resource, subdir.get('Prefix'), local, bucket)
        if result.get('Contents') is not None:
            for file in result.get('Contents'):
                if not os.path.exists(os.path.dirname(local + os.sep + file.get('Key'))):
                     os.makedirs(os.path.dirname(local + os.sep + file.get('Key')))
                resource.meta.client.download_file(bucket, file.get('Key'), local + os.sep + file.get('Key'))

def handler(event, context):
    client = boto3.client('s3')
    resource = boto3.resource('s3')
    if (os.path.isdir("/tmp/en_core_web_sm")==False):
        download_dir(client, resource, 'en_core_web_sm', '/tmp','ryfeus-spacy')
    spacy.util.set_data_path('/tmp')
    nlp = spacy.load('/tmp/en_core_web_sm/en_core_web_sm-2.0.0')
    doc = nlp(u'Apple is looking at buying U.K. startup for $1 billion')
    for token in doc:
        print(token.text, token.pos_, token.dep_)
    return 'finished'

P.S. To package spacy within AWS Lambda you have to strip shared libraries.

104

answered Feb 13 '23 17:02

Ryfeus

Knew it was probably going to be something simple. The answer is that there wasn't enough allocated memory to run the Lambda function - I found that I had to minimally increase it to near the max 2816 MB to get the example above to work. It is notable that before last month it wasn't possible to go this high:

https://aws.amazon.com/about-aws/whats-new/2017/11/aws-lambda-doubles-maximum-memory-capacity-for-lambda-functions/

I turned it up to the max of 3008 MB to handle more text and everything seems to work just fine now.

answered Feb 13 '23 16:02

pynewb

What worked for me was cding into <YOUR_ENV>/lib/Python<VERSION>/site-packages/ and removing the language models I didn't need. For example, I only needed the English language model so once in my own site-packages directory I just needed to run als -d */ | grep -v en | xargs rm -rf`, and then zip up the contents to get it under Lambda's limits.

answered Feb 13 '23 15:02

Keith Johnson

Related questions
                            
                                SAM build - does it also build layers?
                            
                                Why does my Lambda function timeout connecting to SES VPC Endpoint?
                            
                                Does AWS Lambda support aws-sdk v3 or not?
                            
                                AWS API Gatewat with proxy Lambda: Invalid permissions on Lambda function
                            
                                How to correctly specify SSML in an Alexa Skill lambda function?
                            
                                Passing serverless API Gateway URL as a parameter for a Lambda function in the same stack
                            
                                InvalidLambdaFunctionAssociation when creating CloudFront distribution via Terraform
                            
                                Lambda in VPC won't create new ENI after an ENI has been manually detached from subnet
                            
                                Cloudformation template to trigger Lambda on S3 event
                            
                                User Migration to Cognito using Lambda trigger in python
                            
                                Testing my AWS Lambda function with a fake S3Event
                            
                                Upload Image into S3 bucket using Api Gateway, Lambda funnction
                            
                                AWS Lambda-API gateway "message": "Internal server error" (502 Bad Gateway)
                            
                                Why is aws lambda invocation client incorrectly returning ClientExecutionTimeoutException?
                            
                                CloudWatch Events rule Limits
                            
                                How to make Aamzon API Gateway accept requests only from specific host
                            
                                Extract and save attachment from email (via SES) into AWS S3
                            
                                AWS Step Functions Data Limit
                            
                                Using Django ORM inside an AWS Lambda function
                            
                                Setting Java AWS Lambda VM Parameters

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

SpaCy model won't load in AWS Lambda

Tags:

aws-lambda

spacy

pynewb

People also ask

3 Answers

Ryfeus

pynewb

Keith Johnson

Recent Activity

Donate For Us