Lambda not supporting NLTK file size

Tags:

I am writing a python script that analyses a piece of text and returns the data in JSON format. I am using NLTK, to analyze the data. Basically, this is my flow:

Create an endpoint (API gateway) -> calls my lambda function -> returns JSON of required data.

I wrote my script, deployed to lambda but I ran into this issue:

Resource \u001b[93mpunkt\u001b[0m not found. Please use the NLTK Downloader to obtain the resource:

\u001b[31m>>> import nltk nltk.download('punkt') \u001b[0m
Searched in: - '/home/sbx_user1058/nltk_data' - '/usr/share/nltk_data' - '/usr/local/share/nltk_data' - '/usr/lib/nltk_data' - '/usr/local/lib/nltk_data' - '/var/lang/nltk_data' - '/var/lang/lib/nltk_data'

Even after downloading 'punkt', my script still gave me the same error. I tried the solutions here :

Optimizing python script extracting and processing large data files

but the issue is, the nltk_data folder is huge, while lambda has a size restriction.

How can I fix this issue? Or where else can I use my script and still integrate API call?

I am using serverless to deploy my python scripts.

886

asked Oct 20 '17 09:10

noor

1 Answers

There are two things that you can do:

The errors seems like the path is not being defined properly, maybe set it as an env Variable?

sys.path.append(os.path.abspath('/var/task/nltk_data/')

or this way

Once you run nltk.download(), then copy it to the root folder of your AWS lambda application. (Name the dir to be called "nltk_data".)
In the lambda function dashboard (in the AWS console), add NLTK_DATA=./nltk_data as a key-var Environment Variable.

reduce the size of the nltk downloads, since you won't be needing all of them.
1. Delete all the zip files, keep only the needed section, for example: stopwords. That can be moved into: save nltk_data/corpora/stopwords and delete the rest.
2. Or If you need tokenizers save to nltk_data/tokenizers/punkt. Most of these can be separately downloaded: python -m nltk.downloader punkt, then copy over the files.

121

answered Sep 23 '22 07:09

0bserver07

Related questions
                            
                                How to replace a list of values in a numpy array?
                            
                                TensorFlow - Defining the shape of a variable dynamically, depending on the shape of another variable
                            
                                Why does creating a datetime with a tzinfo from pytz show a weird time offset?
                            
                                Exporting jupyter notebook to pdf with offline plotly graph; missing graphs
                            
                                Installing packages from a list using pip
                            
                                How to increase iterations for scipy.optimize.linprog function in python?
                            
                                How to verify structure a neural network in keras model?
                            
                                Zappa not packaging nested source directories
                            
                                return the index using pandas series.sample()?
                            
                                Python program outputting different results, even though no random is used
                            
                                How do you append the values of the first column to all other columns in a pandas dataframe
                            
                                Using Python Selenium Webdriver to open Electron Application
                            
                                How to get original values after using factorize() in Python?
                            
                                Anaconda Prompt Corrupts after Installation
                            
                                Why does `head` need `()` and `shape` does not?
                            
                                Python PCA plot using Hotelling's T2 for a confidence interval
                            
                                How to create custom transport for asyncio?
                            
                                Python: Comparing two JSON objects in pytest
                            
                                Fastest way to loop over Pandas DataFrame for API calls
                            
                                Python/pandas - Using DataFrame.apply with function returning dictionary

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Lambda not supporting NLTK file size

Tags:

python

json

lambda

amazon-web-services

noor

People also ask

1 Answers

0bserver07

Recent Activity

Donate For Us