I am writing a python script that analyses a piece of text and returns the data in JSON format. I am using NLTK, to analyze the data. Basically, this is my flow:
Create an endpoint (API gateway) -> calls my lambda function -> returns JSON of required data.
I wrote my script, deployed to lambda but I ran into this issue:
Resource \u001b[93mpunkt\u001b[0m not found. Please use the NLTK Downloader to obtain the resource:
\u001b[31m>>> import nltk nltk.download('punkt') \u001b[0m
Searched in: - '/home/sbx_user1058/nltk_data' - '/usr/share/nltk_data' - '/usr/local/share/nltk_data' - '/usr/lib/nltk_data' - '/usr/local/lib/nltk_data' - '/var/lang/nltk_data' - '/var/lang/lib/nltk_data'
Even after downloading 'punkt', my script still gave me the same error. I tried the solutions here :
Optimizing python script extracting and processing large data files
but the issue is, the nltk_data folder is huge, while lambda has a size restriction.
How can I fix this issue? Or where else can I use my script and still integrate API call?
I am using serverless to deploy my python scripts.
It depends on where you set the destination folder when you download the data using nltk. download(). On Windows 10, the default destination is either C:\Users\narae\nltk_data or C:\Users\narae\AppData\Roaming\nltk_data, but you can specify a different directory before downloading.
NLTK is a leading platform for building Python programs to work with human language data.
There are two things that you can do:
sys.path.append(os.path.abspath('/var/task/nltk_data/')
or this way
Once you run nltk.download()
, then copy it to the root folder of your AWS lambda application. (Name the dir to be called "nltk_data".)
In the lambda function dashboard (in the AWS console), add NLTK_DATA
=./nltk_data
as a key-var Environment Variable.
reduce the size of the nltk downloads, since you won't be needing all of them.
Delete all the zip files, keep only the needed section, for example: stopwords. That can be moved into: save nltk_data/corpora/stopwords
and delete the rest.
Or If you need tokenizers save to nltk_data/tokenizers/punkt
. Most of these can be separately downloaded: python -m nltk.downloader punkt
, then copy over the files.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With