Meaning of Stanford Spanish POS Tagger tags

Tags:

I am tagging Spanish text with the Stanford POS Tagger (via NLTK in Python).

Here is my code:

import nltk
from nltk.tag.stanford import POSTagger
spanish_postagger = POSTagger('models/spanish.tagger', 'stanford-postagger.jar')
spanish_postagger.tag('esta es una oracion de prueba'.split())

The result is:

[(u'esta', u'pd000000'),
(u'es', u'vsip000'),
(u'una', u'di0000'),
(u'oracion', u'nc0s000'),
(u'de', u'sp000'),
(u'prueba', u'nc0s000')]

I want to know where can I found what exactly means pd000000, vsip000, di0000, nc0s000, sp000?

263

asked Nov 20 '14 19:11

Pedro Muñoz

1 Answers

This is a simplified version of the tagset used in the AnCora treebank. You can find their tagset documentation here: https://web.archive.org/web/20160325024315/http://nlp.lsi.upc.edu/freeling/doc/tagsets/tagset-es.html

The "simplification" consists of nulling out many of the final fields which don't strictly belong in a part-of-speech tag. For example, our part-of-speech tagger will always give you null (0) values for the NER field of the original tagset (see EAGLES noun documentation).

In short: the fields in the POS tags produced by our tagger correspond exactly to AnCora POS fields, but a lot of those fields will be null. For most practical purposes you'll only need to look at the first 2–4 characters of the tag. The first character always indicates the broad POS category, and the second character indicates some kind of subtype.

We're in the process of writing some introductory documentation for using Spanish with CoreNLP (that means understanding these tags, and much else) right now. For the moment, you can find more information on the first page of our technical documentation.

157

answered Nov 06 '22 16:11

Jon Gauthier

Related questions
                            
                                Differences between BaseHttpServer and wsgiref.simple_server
                            
                                Does scipy logsumexp() deal with the underflow challenge?
                            
                                Using the python MySQLDB SScursor with nested queries
                            
                                Strange assignment in numpy arrays
                            
                                Python dir equivalent in perl?
                            
                                uwsgi returns blank output
                            
                                Dump elementtree into xml file
                            
                                Why does s[len(s)-1:-1:-1] not work?
                            
                                How do I change a value in a .npz file?
                            
                                How does 'yield' work in tornado when making an asynchronous call?
                            
                                Difference between every pair of columns of two numpy arrays (how to do it more efficiently)?
                            
                                Pandas report top-n in group and pivot
                            
                                Writing a formated binary file from a Pandas Dataframe
                            
                                Python Scrapy tutorial KeyError: 'Spider not found:
                            
                                Connection Error: A connection attempt failed because the connected party did not properly respond after a period of time
                            
                                Does filter2D in opencv really do its job?
                            
                                How to get transaction ID in braintree sale
                            
                                Python empty list Exception [closed]
                            
                                How do I use the Postgresql ANY operator in a NOT IN statement
                            
                                Importing Python Flask JSON dictionary into javascript error

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Meaning of Stanford Spanish POS Tagger tags

Tags:

python

text-mining

stanford-nlp

Pedro Muñoz

People also ask

1 Answers

Jon Gauthier

Recent Activity

Donate For Us