I have been trying to use the dependency parse trees generated by CMU's TurboParser. It works flawlessly. The problem, however, is that there is very little documentation. I need to precisely understand the output of their parser. For example, the sentence "I solved the problem with statistics." generates the following output:
1 I _ PRP PRP _ 2 SUB
2 solved _ VBD VBD _ 0 ROOT
3 the _ DT DT _ 4 NMOD
4 problem _ NN NN _ 2 OBJ
5 with _ IN IN _ 2 VMOD
6 statistics _ NNS NNS _ 5 PMOD
7 . _ . . _ 2 P
I haven't found any documentation that can help understand what the various columns stand for, and how the indices in the second-last column (2, 0, 4, 2, ... ) are created. Also, I have no idea why there are two columns devoted to part-of-speech tags. Any help (or link to external documentation) will be of great help.
P.S. If you want to try out their parser, here is their online demo.
P.P.S. Please do not suggest using Stanford's dependency parse output. I am interested in linear programming algorithms, which is not what Stanford's NLP system does.
Here is the meaning of each of the columns TurboParser outputs:
0
)The generated output you gave can be represented as a dependency-based parse tree:
For further information on the CoNLL-X format:
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With