Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What does the dependency-parse output of TurboParser mean?

Tags:

nlp

parse-tree

I have been trying to use the dependency parse trees generated by CMU's TurboParser. It works flawlessly. The problem, however, is that there is very little documentation. I need to precisely understand the output of their parser. For example, the sentence "I solved the problem with statistics." generates the following output:

1   I           _   PRP PRP _   2   SUB
2   solved      _   VBD VBD _   0   ROOT
3   the         _   DT  DT  _   4   NMOD
4   problem     _   NN  NN  _   2   OBJ
5   with        _   IN  IN  _   2   VMOD
6   statistics  _   NNS NNS _   5   PMOD
7   .           _   .   .   _   2   P

I haven't found any documentation that can help understand what the various columns stand for, and how the indices in the second-last column (2, 0, 4, 2, ... ) are created. Also, I have no idea why there are two columns devoted to part-of-speech tags. Any help (or link to external documentation) will be of great help.

P.S. If you want to try out their parser, here is their online demo.

P.P.S. Please do not suggest using Stanford's dependency parse output. I am interested in linear programming algorithms, which is not what Stanford's NLP system does.

like image 985
Chthonic Project Avatar asked Jun 24 '14 18:06

Chthonic Project


1 Answers

Here is the meaning of each of the columns TurboParser outputs:

  1. id of the token, i.e. its one-based index in the sentence
  2. original token as it was in the original text
  3. lemma, the lemmatized form of the token (empty here, because no lemmatizer has been set)
  4. tag (coarse-grained part-of-speech tag)
  5. tag (fine-grained part-of-speech tag, which is the same as 4. with TurboParser)
  6. morphological features (empty here)
  7. head of the token, represented by its index (the root token has a head value of 0)
  8. relation of the current token with its head

The generated output you gave can be represented as a dependency-based parse tree:

representation of the dependency-based parse tree

For further information on the CoNLL-X format:

  • http://wacky.sslmit.unibo.it/lib/exe/fetch.php?media=papers:conll-syntax.pdf
  • http://ilk.uvt.nl/conll/#dataformat
like image 139
Mathieu Rodic Avatar answered Nov 15 '22 07:11

Mathieu Rodic