I want to be able to get POS-Tags of sentences one by one like in this manner:
def __remove_stop_words(self, tokenized_text, stop_words):
sentences_pos = nltk.pos_tag(tokenized_text)
filtered_words = [word for (word, pos) in sentences_pos
if pos not in stop_words and word not in stop_words]
return filtered_words
But the problem is that pos_tag()
takes about a second for each sentence. There is another option to use pos_tag_sents()
to do this batch-wise and speed things up. But my life would be easier if I could do this sentence by sentence.
Is there a way to do this faster?
For nltk version 3.1, inside nltk/tag/__init__.py
, pos_tag
is defined like this:
from nltk.tag.perceptron import PerceptronTagger
def pos_tag(tokens, tagset=None):
tagger = PerceptronTagger()
return _pos_tag(tokens, tagset, tagger)
So each call to pos_tag
first instantiates PerceptronTagger
which takes some time because it involves loading a pickle file. _pos_tag
simply calls tagger.tag
when tagset
is None
.
So you can save some time by loading the file once, and calling tagger.tag
yourself instead of calling pos_tag
:
from nltk.tag.perceptron import PerceptronTagger
tagger = PerceptronTagger()
def __remove_stop_words(self, tokenized_text, stop_words, tagger=tagger):
sentences_pos = tagger.tag(tokenized_text)
filtered_words = [word for (word, pos) in sentences_pos
if pos not in stop_words and word not in stop_words]
return filtered_words
pos_tag_sents
uses the same trick as above -- it instantiates PerceptronTagger
once before calling _pos_tag
many times. So you'll get a comparable gain in performance using the above code as you would by refactoring and calling pos_tag_sents
.
Also, if stop_words
is a long list, you may save a bit of time by making stop_words
a set:
stop_words = set(stop_words)
since checking membership in a set (e.g. pos not in stop_words
) is a O(1)
(constant time) operation while checking membership in a list is a O(n)
operation (i.e. it requires time which grows proportionally to the length of the list.)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With