Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

NLTK - TypeError: tagged_words() got an unexpected keyword argument 'simplify_tags'

I was just following the NLTK book chapter 5, and 'simplify_tags' argument in tagged_words() seems to be unexpected. I use Python 3.4, PyCharm, and standard NLTK package.

In[4]: nltk.corpus.brown.tagged_words()
Out[4]: [('The', 'AT'), ('Fulton', 'NP-TL'), ...]
In[5]: nltk.corpus.brown.tagged_words(simplify_tags = True)
Traceback (most recent call last):
  File "C:\Python34\lib\site-packages\IPython\core\interactiveshell.py", line 2883, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-5-c4f914e3e846>", line 1, in <module>
    nltk.corpus.brown.tagged_words(simplify_tags = True)
TypeError: tagged_words() got an unexpected keyword argument 'simplify_tags'

There is no problem with running this function without simplify_tags. I appreciate any suggestion or input. Thank you!

like image 779
Vicky Zhang Avatar asked Apr 02 '15 17:04

Vicky Zhang


2 Answers

Yes as noted the latest version of simplified tag is to map them to the universal tagset (https://code.google.com/p/universal-pos-tags/).

>>> from nltk.corpus import brown
>>> brown.tagged_words(tagset='universal')
[(u'The', u'DET'), (u'Fulton', u'NOUN'), ...]
>>> brown.tagged_words(tagset='universal')[:10]
[(u'The', u'DET'), (u'Fulton', u'NOUN'), (u'County', u'NOUN'), (u'Grand', u'ADJ'), (u'Jury', u'NOUN'), (u'said', u'VERB'), (u'Friday', u'NOUN'), (u'an', u'DET'), (u'investigation', u'NOUN'), (u'of', u'ADP')]

However do note there is still one corpus reader that has simplify_tags parameter, see https://github.com/nltk/nltk/blob/develop/nltk/corpus/reader/ipipan.py#L23

Possibly it's in the pipeline for the ipipan corpus reader to move to the universal tagset.

Also, do note that not all corpus reader have the ability to map to the unviersal tagset, some are in the TODO list, e.g. https://github.com/nltk/nltk/blob/develop/nltk/corpus/reader/tagged.py#L260

like image 54
alvas Avatar answered Sep 17 '22 17:09

alvas


Question resolved. I'm now following the latest version of the book, which is still being updated, and it uses tagset='universal' parameter instead.

like image 41
Vicky Zhang Avatar answered Sep 18 '22 17:09

Vicky Zhang