I want to be able to get POS-Tags of sentences one by one like in this manner:
def __remove_stop_words(self, tokenized_text, stop_words):
sentences_pos = nltk.pos_tag(tokenized_text)
filtered_words = [word for (word, pos) in sentences_pos
if pos not in stop_words and word not in stop_words]
return filtered_words
But the problem is that pos_tag() takes about a second for each sentence. There is another option to use pos_tag_sents() to do this batch-wise and speed things up. But my life would be easier if I could do this sentence by sentence.
Is there a way to do this faster?
For nltk version 3.1, inside nltk/tag/__init__.py, pos_tag is defined like this:
from nltk.tag.perceptron import PerceptronTagger
def pos_tag(tokens, tagset=None):
tagger = PerceptronTagger()
return _pos_tag(tokens, tagset, tagger)
So each call to pos_tag first instantiates PerceptronTagger which takes some time because it involves loading a pickle file. _pos_tag simply calls tagger.tag when tagset is None.
So you can save some time by loading the file once, and calling tagger.tag yourself instead of calling pos_tag:
from nltk.tag.perceptron import PerceptronTagger
tagger = PerceptronTagger()
def __remove_stop_words(self, tokenized_text, stop_words, tagger=tagger):
sentences_pos = tagger.tag(tokenized_text)
filtered_words = [word for (word, pos) in sentences_pos
if pos not in stop_words and word not in stop_words]
return filtered_words
pos_tag_sents uses the same trick as above -- it instantiates PerceptronTagger once before calling _pos_tag many times. So you'll get a comparable gain in performance using the above code as you would by refactoring and calling pos_tag_sents.
Also, if stop_words is a long list, you may save a bit of time by making stop_words a set:
stop_words = set(stop_words)
since checking membership in a set (e.g. pos not in stop_words) is a O(1) (constant time) operation while checking membership in a list is a O(n) operation (i.e. it requires time which grows proportionally to the length of the list.)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With