I was just following the NLTK book chapter 5, and 'simplify_tags' argument in tagged_words() seems to be unexpected. I use Python 3.4, PyCharm, and standard NLTK package.
In[4]: nltk.corpus.brown.tagged_words()
Out[4]: [('The', 'AT'), ('Fulton', 'NP-TL'), ...]
In[5]: nltk.corpus.brown.tagged_words(simplify_tags = True)
Traceback (most recent call last):
File "C:\Python34\lib\site-packages\IPython\core\interactiveshell.py", line 2883, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-5-c4f914e3e846>", line 1, in <module>
nltk.corpus.brown.tagged_words(simplify_tags = True)
TypeError: tagged_words() got an unexpected keyword argument 'simplify_tags'
There is no problem with running this function without simplify_tags. I appreciate any suggestion or input. Thank you!
Yes as noted the latest version of simplified tag is to map them to the universal tagset (https://code.google.com/p/universal-pos-tags/).
>>> from nltk.corpus import brown
>>> brown.tagged_words(tagset='universal')
[(u'The', u'DET'), (u'Fulton', u'NOUN'), ...]
>>> brown.tagged_words(tagset='universal')[:10]
[(u'The', u'DET'), (u'Fulton', u'NOUN'), (u'County', u'NOUN'), (u'Grand', u'ADJ'), (u'Jury', u'NOUN'), (u'said', u'VERB'), (u'Friday', u'NOUN'), (u'an', u'DET'), (u'investigation', u'NOUN'), (u'of', u'ADP')]
However do note there is still one corpus reader that has simplify_tags
parameter, see https://github.com/nltk/nltk/blob/develop/nltk/corpus/reader/ipipan.py#L23
Possibly it's in the pipeline for the ipipan corpus reader to move to the universal tagset.
Also, do note that not all corpus reader have the ability to map to the unviersal tagset, some are in the TODO list, e.g. https://github.com/nltk/nltk/blob/develop/nltk/corpus/reader/tagged.py#L260
Question resolved. I'm now following the latest version of the book, which is still being updated, and it uses tagset='universal' parameter instead.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With