Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SpaCy model won't load in AWS Lambda

Has anyone gotten SpaCy 2.0 to work in AWS Lambda? I have everything zipped and packaged correctly, since I can get a generic string to return from my lambda function if I test it. But when I do the simple function below to test, it stalls for about 10 seconds and then returns empty, and I don't get any error messages. I did set my Lambda timeout at 60 seconds so that isn't the problem.

import spacy

nlp = spacy.load('en_core_web_sm') #model package included

def lambda_handler(event, context):
    doc = nlp(u'They are')
    msg = doc[0].lemma_
    return msg

When I load the model package without using it, it also returns empty, but if I comment it out it sends me the string as expected, so it has to be something about loading the model.

import spacy

nlp = spacy.load('en_core_web_sm') #model package included

def lambda_handler(event, context):
    msg = 'message returned'
    return msg
like image 289
pynewb Avatar asked Dec 19 '17 02:12

pynewb


People also ask

Why spaCy load is not working?

This error means that the spaCy module can't be located on your system, or in your environment. Make sure you have spaCy installed. If you're using a virtual environment, make sure it's activated and check that spaCy is installed in that environment – otherwise, you're trying to load a system installation.

How do I manually download a spaCy model?

To download and install the models manually, unpack the archive, drop the contained directory into spacy/data and load the model via spacy. load('en') or spacy. load('de') .


3 Answers

To optimize model load you have to store it on S3, and download it using your own script to tmp folder in lambda and then load it into spacy from it.

It will take 5 seconds to download it from S3 and run. The good optimization here is to keep model on warm container and check if it was already downloaded. On warm container code takes 0.8 seconds to run.

Here is the link to the code and package with example: https://github.com/ryfeus/lambda-packs/blob/master/Spacy/source2.7/index.py

import spacy
import boto3
import os


def download_dir(client, resource, dist, local='/tmp', bucket='s3bucket'):
    paginator = client.get_paginator('list_objects')
    for result in paginator.paginate(Bucket=bucket, Delimiter='/', Prefix=dist):
        if result.get('CommonPrefixes') is not None:
            for subdir in result.get('CommonPrefixes'):
                download_dir(client, resource, subdir.get('Prefix'), local, bucket)
        if result.get('Contents') is not None:
            for file in result.get('Contents'):
                if not os.path.exists(os.path.dirname(local + os.sep + file.get('Key'))):
                     os.makedirs(os.path.dirname(local + os.sep + file.get('Key')))
                resource.meta.client.download_file(bucket, file.get('Key'), local + os.sep + file.get('Key'))

def handler(event, context):
    client = boto3.client('s3')
    resource = boto3.resource('s3')
    if (os.path.isdir("/tmp/en_core_web_sm")==False):
        download_dir(client, resource, 'en_core_web_sm', '/tmp','ryfeus-spacy')
    spacy.util.set_data_path('/tmp')
    nlp = spacy.load('/tmp/en_core_web_sm/en_core_web_sm-2.0.0')
    doc = nlp(u'Apple is looking at buying U.K. startup for $1 billion')
    for token in doc:
        print(token.text, token.pos_, token.dep_)
    return 'finished'

P.S. To package spacy within AWS Lambda you have to strip shared libraries.

like image 104
Ryfeus Avatar answered Feb 13 '23 17:02

Ryfeus


Knew it was probably going to be something simple. The answer is that there wasn't enough allocated memory to run the Lambda function - I found that I had to minimally increase it to near the max 2816 MB to get the example above to work. It is notable that before last month it wasn't possible to go this high:

https://aws.amazon.com/about-aws/whats-new/2017/11/aws-lambda-doubles-maximum-memory-capacity-for-lambda-functions/

I turned it up to the max of 3008 MB to handle more text and everything seems to work just fine now.

like image 39
pynewb Avatar answered Feb 13 '23 16:02

pynewb


What worked for me was cding into <YOUR_ENV>/lib/Python<VERSION>/site-packages/ and removing the language models I didn't need. For example, I only needed the English language model so once in my own site-packages directory I just needed to run als -d */ | grep -v en | xargs rm -rf`, and then zip up the contents to get it under Lambda's limits.

like image 21
Keith Johnson Avatar answered Feb 13 '23 15:02

Keith Johnson