I am new to text mining. I am using a open source jar (Mate Parser) which gives me output in a CoNLL 2009 format after dependency parsing. I want to use the dependency parsing results for Information Extraction. But i am able to understand some of the output but not able to comprehend the CoNLL data format. Can any one help me in making me understand the CoNLL data format?? Any kind of pointers would be appreciated.
There are many different CoNLL formats since CoNLL is a different shared task each year. The format for CoNLL 2009 is described here. Each line represents a single word with a series of tab-separated fields. _
s indicate empty values. Mate-Parser's manual says that it uses the first 12 columns of CoNLL 2009:
ID FORM LEMMA PLEMMA POS PPOS FEAT PFEAT HEAD PHEAD DEPREL PDEPREL
The definition of some of these columns come from earlier shared tasks (the CoNLL-X format used in 2006 and 2007):
ID
(index in sentence, starting at 1)FORM
(word form itself)LEMMA
(word's lemma or stem)POS
(part of speech)FEAT
(list of morphological features separated by |)HEAD
(index of syntactic parent, 0 for ROOT
)DEPREL
(syntactic relationship between HEAD
and this word)There are variants of those columns (e.g., PPOS
but not POS
) that start with P
indicate that the value was automatically predicted rather a gold standard value.
Update: There is now a CoNLL-U data format as well which extends the CoNLL-X format.
As update to @dmcc's answer:
<TAB>
as separator)In CoNLL formats,
Be careful when working with tools or libraries that claim to support (some) "CoNLL format". Different CoNLL formats have different order of columns and the developer might not be aware of that. So, it is likely that they don't work as expected if they get data from another (or unspecified) CoNLL format.
For converting between different CoNLL formats, you can consider using CoNLL-RDF (https://github.com/acoli-repo/conll-rdf), resp., CoNLL-Transform (https://github.com/acoli-repo/conll-transform) (Disclaimer: Developed by my lab.)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With