What does the dependency-parse output of TurboParser mean?

Question

I have been trying to use the dependency parse trees generated by CMU's TurboParser. It works flawlessly. The problem, however, is that there is very little documentation. I need to precisely understand the output of their parser. For example, the sentence "I solved the problem with statistics." generates the following output:

1   I           _   PRP PRP _   2   SUB
2   solved      _   VBD VBD _   0   ROOT
3   the         _   DT  DT  _   4   NMOD
4   problem     _   NN  NN  _   2   OBJ
5   with        _   IN  IN  _   2   VMOD
6   statistics  _   NNS NNS _   5   PMOD
7   .           _   .   .   _   2   P

I haven't found any documentation that can help understand what the various columns stand for, and how the indices in the second-last column (2, 0, 4, 2, ... ) are created. Also, I have no idea why there are two columns devoted to part-of-speech tags. Any help (or link to external documentation) will be of great help.

P.S. If you want to try out their parser, here is their online demo.

P.P.S. Please do not suggest using Stanford's dependency parse output. I am interested in linear programming algorithms, which is not what Stanford's NLP system does.

Mathieu Rodic · Accepted Answer

Here is the meaning of each of the columns TurboParser outputs:

id of the token, i.e. its one-based index in the sentence
original token as it was in the original text
lemma, the lemmatized form of the token (empty here, because no lemmatizer has been set)
tag (coarse-grained part-of-speech tag)
tag (fine-grained part-of-speech tag, which is the same as 4. with TurboParser)
morphological features (empty here)
head of the token, represented by its index (the root token has a head value of 0)
relation of the current token with its head

The generated output you gave can be represented as a dependency-based parse tree:

representation of the dependency-based parse tree

For further information on the CoNLL-X format:

http://wacky.sslmit.unibo.it/lib/exe/fetch.php?media=papers:conll-syntax.pdf
http://ilk.uvt.nl/conll/#dataformat

What does the dependency-parse output of TurboParser mean?

Tags:

nlp

parse-tree

Chthonic Project

1 Answers

Mathieu Rodic

Recent Activity

Donate For Us

What does the dependency-parse output of TurboParser mean?

Tags:

nlp

parse-tree

Chthonic Project

1 Answers

Mathieu Rodic

Related questions

Recent Activity

Donate For Us