I am using NLTK to POS-tag hundereds of tweets in a web request. As you know, Django instantiates a request handler for each request. I noticed this: for a request (~200 tweets), the first tweet needs ~18 seconds to tag, while all subsequent tweets need ~120 milliseconds to tag. What can I do to speed up the process? Can I do a "pre-warming request" so that the module data is already loaded for each request? <pre class="prettyprint"><code>class MyRequestHandler(BaseHandler): def read(self, request): #this runs for a GET request #...in a loop: tokens = nltk.word_tokenize( tweet) tagged = nltk.pos_tag( tokens) </code></pre>

Those first 18 seconds are the POS tagger being unpickled from disk into RAM. If you want to get around this, load the tagger yourself outside of a request function. <pre class="prettyprint"><code>import nltk.data, nltk.tag tagger = nltk.data.load(nltk.tag._POS_TAGGER) </code></pre> And then replace <code>nltk.pos_tag</code> with <code>tagger.tag</code>. The tradeoff is that app startup will now take +18seconds.

Slow performance of POS tagging. Can I do some kind of pre-warming?

Tags:

python

nltk

I am using NLTK to POS-tag hundereds of tweets in a web request. As you know, Django instantiates a request handler for each request.

I noticed this: for a request (~200 tweets), the first tweet needs ~18 seconds to tag, while all subsequent tweets need ~120 milliseconds to tag. What can I do to speed up the process?

Can I do a "pre-warming request" so that the module data is already loaded for each request?

class MyRequestHandler(BaseHandler):
    def read(self, request): #this runs for a GET request
        #...in a loop:
            tokens = nltk.word_tokenize( tweet)
            tagged = nltk.pos_tag( tokens)

633

asked Jul 23 '12 09:07

Jesvin Jose

1 Answers

Those first 18 seconds are the POS tagger being unpickled from disk into RAM. If you want to get around this, load the tagger yourself outside of a request function.

import nltk.data, nltk.tag
tagger = nltk.data.load(nltk.tag._POS_TAGGER)

And then replace nltk.pos_tag with tagger.tag. The tradeoff is that app startup will now take +18seconds.

184

answered Oct 22 '22 02:10

Jacob

Related questions
                            
                                What's the difference between plus and append in python for list manipulation? [duplicate]
                            
                                Python: How to get the Content-Type of an URL?
                            
                                Active Django settings file from Celery worker
                            
                                Beautiful Soup and Table Scraping - lxml vs html parser
                            
                                Reduce multiple blank lines to single (Pythonically)
                            
                                Django Rest Framework receive primary key value in POST and return model object as nested serializer
                            
                                PySpark: when function with multiple outputs [duplicate]
                            
                                gspread authentication throwing insufficient permission
                            
                                module 'pip' has no attribute 'pep425tags'
                            
                                pipenv : how to force virtualenv directory?
                            
                                How to convert Numpy array to Panda DataFrame
                            
                                AzureBlob Upload ERROR:The specified blob already exists
                            
                                Python's timedelta: can't I just get in whatever time unit I want the value of the entire difference?
                            
                                Python truncate lines as they are read
                            
                                Django Forms: Foreign Key in Hidden Field
                            
                                Parsing a stdout in Python
                            
                                Website stress test in Python - Django
                            
                                how to output every line in a file python
                            
                                datetime.strptime () throws 'does not match format' error
                            
                                Scrapy image download how to use custom filename

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With