The official documentation of token.tag_
in spaCy
is as follows:
A fine-grained, more detailed tag that represents the word-class and some basic morphological information for the token. These tags are primarily designed to be good features for subsequent models, particularly the syntactic parser. They are language and treebank dependent. The tagger is trained to predict these fine-grained tags, and then a mapping table is used to reduce them to the coarse-grained .pos tags.
But it doesn't list the full available tags and each tag's explanation. Where can I find it?
tag_ in spaCy is as follows: A fine-grained, more detailed tag that represents the word-class and some basic morphological information for the token. These tags are primarily designed to be good features for subsequent models, particularly the syntactic parser. They are language and treebank dependent.
When you call nlp on a text, spaCy first tokenizes the text to produce a Doc object. The Doc is then processed in several different steps – this is also referred to as the processing pipeline. The pipeline used by the trained pipelines typically include a tagger, a lemmatizer, a parser and an entity recognizer.
INTJ : interjection, e.g. psst, ouch, bravo, hello. NOUN : noun, e.g. girl, cat, tree, air, beauty. NUM : numeral, e.g. 1, 2017, one, seventy-seven, IV, MMXIV. PART : particle, e.g. 's, not, PRON : pronoun, e.g I, you, he, she, myself, themselves, somebody.
A Doc is a sequence of Token objects. Access sentences and named entities, export annotations to numpy arrays, losslessly serialize to compressed binary strings. The Doc object holds an array of TokenC structs. The Python-level Token and Span objects are views of this array, i.e. they don't own the data themselves.
Finally I found it inside spaCy
's source code: glossary.py. And this link explains the meaning of different tags.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With