I am using NLTK to POS-tag hundereds of tweets in a web request. As you know, Django instantiates a request handler for each request.
I noticed this: for a request (~200 tweets), the first tweet needs ~18 seconds to tag, while all subsequent tweets need ~120 milliseconds to tag. What can I do to speed up the process?
Can I do a "pre-warming request" so that the module data is already loaded for each request?
class MyRequestHandler(BaseHandler):
def read(self, request): #this runs for a GET request
#...in a loop:
tokens = nltk.word_tokenize( tweet)
tagged = nltk.pos_tag( tokens)
One of the pre-processing/feature extraction steps is POS-tagging, whereby a grammatical word-type is assigned to each word in a text (in this case a tweet).
The main problem with POS tagging is ambiguity. In English, many common words have multiple meanings and therefore multiple POS . The job of a POS tagger is to resolve this ambiguity accurately based on the context of use. For example, the word "shot" can be a noun or a verb.
The POS tagging process is the process of finding the sequence of tags which is most likely to have generated a given word sequence. We can model this POS process by using a Hidden Markov Model (HMM), where tags are the hidden states that produced the observable output, i.e., the words.
POS Tagging in NLTK is a process to mark up the words in text format for a particular part of a speech based on its definition and context. Some NLTK POS tagging examples are: CC, CD, EX, JJ, MD, NNP, PDT, PRP$, TO, etc. POS tagger is used to assign grammatical information of each word of the sentence.
Those first 18 seconds are the POS tagger being unpickled from disk into RAM. If you want to get around this, load the tagger yourself outside of a request function.
import nltk.data, nltk.tag
tagger = nltk.data.load(nltk.tag._POS_TAGGER)
And then replace nltk.pos_tag
with tagger.tag
.
The tradeoff is that app startup will now take +18seconds.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With