Python NLTK: How to tag sentences with the simplified set of part-of-speech tags?

Tags:

Chapter 5 of the Python NLTK book gives this example of tagging words in a sentence:

>>> text = nltk.word_tokenize("And now for something completely different") >>> nltk.pos_tag(text) [('And', 'CC'), ('now', 'RB'), ('for', 'IN'), ('something', 'NN'), ('completely', 'RB'), ('different', 'JJ')]

nltk.pos_tag calls the default tagger, which uses a full set of tags. Later in the chapter a simplified set of tags is introduced.

How can I tag sentences with this simplified set of part-of-speech tags?

Also have I understood the tagger correctly, i.e. can I change the tag set that the tagger uses as I'm asking, or should I map the tags it returns on to the simplified set, or should I create a new tagger from a new, simply-tagged corpus?

970

asked Apr 26 '11 08:04

Ollie Glass

1 Answers

Updated, in case anyone runs across the same problem. NLTK has since upgraded to a "universal" tagset, source here. Once you've tagged your text, use map_tag to simplify the tags.

import nltk from nltk.tag import pos_tag, map_tag  text = nltk.word_tokenize("And now for something completely different") posTagged = pos_tag(text) simplifiedTags = [(word, map_tag('en-ptb', 'universal', tag)) for word, tag in posTagged] print(simplifiedTags) # [('And', u'CONJ'), ('now', u'ADV'), ('for', u'ADP'), ('something', u'NOUN'), ('completely', u'ADV'), ('different', u'ADJ')]

answered Oct 04 '22 03:10

Bridgette

Related questions
                            
                                changing the marker size in python seaborn lmplot
                            
                                Disallowed Host at Django
                            
                                Pandas : balancing data
                            
                                Different colors for points and line in Seaborn regplot
                            
                                GCS - Read a text file from Google Cloud Storage directly into python
                            
                                Starmap combined with tqdm?
                            
                                Creating sublists [duplicate]
                            
                                Would you prefer using del or reassigning to None (garbage collecting)
                            
                                Python logging only log from script
                            
                                ElementTree findall() returning empty list
                            
                                Vectorized look-up of values in Pandas dataframe
                            
                                Counting the amount of occurrences in a list of tuples
                            
                                Making py.test, coverage and tox work together: __init__.py in tests folder?
                            
                                Python: How to get values of an array at certain index positions?
                            
                                AttributeError: 'set' object has no attribute 'items'
                            
                                Infinite for loops possible in Python?
                            
                                Advantages of Using MethodType in Python
                            
                                Use Flask to convert a Pandas dataframe to CSV and serve a download
                            
                                Value error trying to install Python for Windows extensions
                            
                                Django: Can't render STATIC_URL from settings in template

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Python NLTK: How to tag sentences with the simplified set of part-of-speech tags?

Tags:

python

nltk

tagging

Ollie Glass

People also ask

1 Answers

Bridgette

Recent Activity

Donate For Us