POS-Tagger is incredibly slow

Tags:

I am using nltk to generate n-grams from sentences by first removing given stop words. However, nltk.pos_tag() is extremely slow taking up to 0.6 sec on my CPU (Intel i7).

The output:

Click to copy

['The first time I went, and was completely taken by the live jazz band and atmosphere, I ordered the Lobster Cobb Salad.']
0.620481014252
["It's simply the best meal in NYC."]
0.640982151031
['You cannot go wrong at the Red Eye Grill.']
0.644664049149

The code:

Click to copy

for sentence in source:

    nltk_ngrams = None

    if stop_words is not None:   
        start = time.time()
        sentence_pos = nltk.pos_tag(word_tokenize(sentence))
        print time.time() - start

        filtered_words = [word for (word, pos) in sentence_pos if pos not in stop_words]
    else:
        filtered_words = ngrams(sentence.split(), n)

Is this really that slow or am I doing something wrong here?

446

asked Nov 12 '15 16:11

Stefan Falk

1 Answers

Use pos_tag_sents for tagging multiple sentences:

Click to copy

>>> import time
>>> from nltk.corpus import brown
>>> from nltk import pos_tag
>>> from nltk import pos_tag_sents
>>> sents = brown.sents()[:10]
>>> start = time.time(); pos_tag(sents[0]); print time.time() - start
0.934092998505
>>> start = time.time(); [pos_tag(s) for s in sents]; print time.time() - start
9.5061340332
>>> start = time.time(); pos_tag_sents(sents); print time.time() - start 
0.939551115036

102

answered Dec 02 '22 07:12

alvas

Related questions
                            
                                Wagtail: Get previous or next sibling
                            
                                How do you use python-daemon the way that it's documentation dictates?
                            
                                Install Vim via Homebrew with Python AND Python3 Support
                            
                                range(n)[x:y:z]
                            
                                How to use re() to extract data from javascript variable using scrapy?
                            
                                python xlwings - copy and paste ranges
                            
                                requests, cannot assign requested address, out of ports?
                            
                                Error using numpy.logspace() : how to generate numbers spaced evenly on a log-scale
                            
                                How to bind an action to the heading of a tkinter treeview in python?
                            
                                What are the kernel coefficients for OpenCV's Sobel filter for sizes larger than 3 x 3?
                            
                                Deleting form from django formset
                            
                                How to obtain all available Elastic IP addresses in boto3
                            
                                How to use random.RandomState
                            
                                Selenium Python Headless Webdriver (PhantomJS) Not Working
                            
                                sys.stdout.write in python3 adds 11 at end of string
                            
                                Measuring the similarity between two irregular plots
                            
                                Django API Post method returns 403 error
                            
                                How to use Django's index_together for query both with filter and order_by?
                            
                                How do I drop a table in SQLAlchemy when I don't have a table object?
                            
                                How to create multiple series scatter plot with connected points using seaborn?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

POS-Tagger is incredibly slow

Tags:

python

nlp

nltk

pos-tagger

Stefan Falk

People also ask

1 Answers

alvas

Recent Activity

Donate For Us