I have multiple texts and I would like to create profiles of them based on their usage of various parts of speech, like nouns and verbs. Basially, I need to count how many times each part of speech is used. I have tagged the text but am not sure how to go further: <pre class="prettyprint"><code>tokens = nltk.word_tokenize(text.lower()) text = nltk.Text(tokens) tags = nltk.pos_tag(text) </code></pre> How can I save the counts for each part of speech into a variable?

The <code>pos_tag</code> method gives you back a list of (token, tag) pairs: <pre class="prettyprint"><code>tagged = [('the', 'DT'), ('dog', 'NN'), ('sees', 'VB'), ('the', 'DT'), ('cat', 'NN')] </code></pre> If you are using Python 2.7 or later, then you can do it simply with: <pre class="prettyprint"><code>>>> from collections import Counter >>> counts = Counter(tag for word,tag in tagged) >>> counts Counter({'DT': 2, 'NN': 2, 'VB': 1}) </code></pre> To normalize the counts (giving you the proportion of each) do: <pre class="prettyprint"><code>>>> total = sum(counts.values()) >>> dict((word, float(count)/total) for word,count in counts.items()) {'DT': 0.4, 'VB': 0.2, 'NN': 0.4} </code></pre> Note that in older versions of Python, you'll have to implement <code>Counter</code> yourself: <pre class="prettyprint"><code>>>> from collections import defaultdict >>> counts = defaultdict(int) >>> for word, tag in tagged: ... counts[tag] += 1 >>> counts defaultdict(<type 'int'>, {'DT': 2, 'VB': 1, 'NN': 2}) </code></pre>

Count verbs, nouns, and other parts of speech with python's NLTK

tokens = nltk.word_tokenize(text.lower())
text = nltk.Text(tokens)
tags = nltk.pos_tag(text)

How can I save the counts for each part of speech into a variable?

854

asked May 20 '12 15:05

Zach

1 Answers

The pos_tag method gives you back a list of (token, tag) pairs:

tagged = [('the', 'DT'), ('dog', 'NN'), ('sees', 'VB'), ('the', 'DT'), ('cat', 'NN')]

If you are using Python 2.7 or later, then you can do it simply with:

>>> from collections import Counter
>>> counts = Counter(tag for word,tag in tagged)
>>> counts
Counter({'DT': 2, 'NN': 2, 'VB': 1})

To normalize the counts (giving you the proportion of each) do:

>>> total = sum(counts.values())
>>> dict((word, float(count)/total) for word,count in counts.items())
{'DT': 0.4, 'VB': 0.2, 'NN': 0.4}

Note that in older versions of Python, you'll have to implement Counter yourself:

>>> from collections import defaultdict
>>> counts = defaultdict(int)
>>> for word, tag in tagged:
...  counts[tag] += 1

>>> counts
defaultdict(<type 'int'>, {'DT': 2, 'VB': 1, 'NN': 2})

125

answered Sep 21 '22 14:09

dhg

Related questions
                            
                                IPC with a Python subprocess
                            
                                Tkinter: Window flash when attempting to click away
                            
                                Parse the output of subprocess.call() from Python
                            
                                Enigma replica not yielding expected result
                            
                                How to give exec code meaningful location to show if exception?
                            
                                Returning and printing without assigning to variable?
                            
                                delete every nth row or column in a matrix using Python
                            
                                create a meshgrid for polar coordinates
                            
                                Cython: How to wrap a C++ function that returns a C++ object?
                            
                                SQLAlchemy update parent when related child changes
                            
                                Concurrent atomic select-update
                            
                                Data-Binding in Python?
                            
                                How to pass a nested dictionary to Flask's GET request handler
                            
                                Flask foreign_keys still shows AmbiguousForeignKeysError
                            
                                How to install python modules in a local directory? --user and exporting pythonpath isn't working
                            
                                how to use a terminal embedded in a PyQt GUI
                            
                                Find middle of a list [duplicate]
                            
                                How to pass elegantly Sklearn's GridseachCV's best parameters to another model?
                            
                                How can I use bcrypt/scrypt on appengine for Python?
                            
                                How to extract and download all images from a website using beautifulSoup?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Count verbs, nouns, and other parts of speech with python's NLTK

Tags:

python

nlp

nltk

tagging

part-of-speech

Zach

People also ask

1 Answers

dhg

Recent Activity

Donate For Us