Has anyone gotten SpaCy 2.0 to work in AWS Lambda? I have everything zipped and packaged correctly, since I can get a generic string to return from my lambda function if I test it. But when I do the simple function below to test, it stalls for about 10 seconds and then returns empty, and I don't get any error messages. I did set my Lambda timeout at 60 seconds so that isn't the problem.
import spacy
nlp = spacy.load('en_core_web_sm') #model package included
def lambda_handler(event, context):
doc = nlp(u'They are')
msg = doc[0].lemma_
return msg
When I load the model package without using it, it also returns empty, but if I comment it out it sends me the string as expected, so it has to be something about loading the model.
import spacy
nlp = spacy.load('en_core_web_sm') #model package included
def lambda_handler(event, context):
msg = 'message returned'
return msg
This error means that the spaCy module can't be located on your system, or in your environment. Make sure you have spaCy installed. If you're using a virtual environment, make sure it's activated and check that spaCy is installed in that environment – otherwise, you're trying to load a system installation.
To download and install the models manually, unpack the archive, drop the contained directory into spacy/data and load the model via spacy. load('en') or spacy. load('de') .
To optimize model load you have to store it on S3, and download it using your own script to tmp folder in lambda and then load it into spacy from it.
It will take 5 seconds to download it from S3 and run. The good optimization here is to keep model on warm container and check if it was already downloaded. On warm container code takes 0.8 seconds to run.
Here is the link to the code and package with example: https://github.com/ryfeus/lambda-packs/blob/master/Spacy/source2.7/index.py
import spacy
import boto3
import os
def download_dir(client, resource, dist, local='/tmp', bucket='s3bucket'):
paginator = client.get_paginator('list_objects')
for result in paginator.paginate(Bucket=bucket, Delimiter='/', Prefix=dist):
if result.get('CommonPrefixes') is not None:
for subdir in result.get('CommonPrefixes'):
download_dir(client, resource, subdir.get('Prefix'), local, bucket)
if result.get('Contents') is not None:
for file in result.get('Contents'):
if not os.path.exists(os.path.dirname(local + os.sep + file.get('Key'))):
os.makedirs(os.path.dirname(local + os.sep + file.get('Key')))
resource.meta.client.download_file(bucket, file.get('Key'), local + os.sep + file.get('Key'))
def handler(event, context):
client = boto3.client('s3')
resource = boto3.resource('s3')
if (os.path.isdir("/tmp/en_core_web_sm")==False):
download_dir(client, resource, 'en_core_web_sm', '/tmp','ryfeus-spacy')
spacy.util.set_data_path('/tmp')
nlp = spacy.load('/tmp/en_core_web_sm/en_core_web_sm-2.0.0')
doc = nlp(u'Apple is looking at buying U.K. startup for $1 billion')
for token in doc:
print(token.text, token.pos_, token.dep_)
return 'finished'
P.S. To package spacy within AWS Lambda you have to strip shared libraries.
Knew it was probably going to be something simple. The answer is that there wasn't enough allocated memory to run the Lambda function - I found that I had to minimally increase it to near the max 2816 MB to get the example above to work. It is notable that before last month it wasn't possible to go this high:
https://aws.amazon.com/about-aws/whats-new/2017/11/aws-lambda-doubles-maximum-memory-capacity-for-lambda-functions/
I turned it up to the max of 3008 MB to handle more text and everything seems to work just fine now.
What worked for me was cd
ing into <YOUR_ENV>/lib/Python<VERSION>/site-packages/ and removing the language models I didn't need. For example, I only needed the English language model so once in my own site-packages directory I just needed to run a
ls -d */ | grep -v en | xargs rm -rf`, and then zip up the contents to get it under Lambda's limits.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With