There seem to be a few different settings:
iobtags
iobTags
entitySubclassification (IOB1 or IOB2?)
evaluateIOB
Which setting do I use, and how do I use it correctly?
I tried labelling like this:
1997 B-DATE
volvo B-BRAND
wia64t B-MODEL
highway B-TYPE
tractor I-TYPE
But on the training output, it seemed to think that B-TYPE and I-TYPE were different classes.
I am using the 2013-11-12 release.
How this can be done is currently (2013 releases) a bit of a mess, since there are two different sets of flags for two different DocumentReaderAndWriter
implementations. Sorry.
The most flexible support for different IOB styles is found in CoNLLDocumentReaderAndWriter
. You can have it map any IOB/IOE/... annotation done by hyphenated prefixes like your examples (B-BRAND) to any other while it is reading files with the flag:
-entitySubclassification IOB2
The resulting label set is then used for training and classification. The options are documented in the entitySubclassify()
method of CoNLLDocumentReaderAndWriter
: IOB1, IOB2, IOE1, IOE2, SBIEO, IO. You can find a discussion of IOB1 vs. IOB2 in Tjong Kim Sang and Veenstra 1999. By default the representation is mapped back to IOB1 on output, since that is the default used in the CoNLL conlleval
program, but you can keep it as what you mapped it to with the flag:
-retainEntitySubclassification
To use this DocumentReaderAndWriter
, you can give a training command like:
java8 -mx6g edu.stanford.nlp.ie.crf.CRFClassifier -prop conll.crf.chris2009.prop -readerAndWriter edu.stanford.nlp.sequences.CoNLLDocumentReaderAndWriter -entitySubclassification iob2
Alternatively, ColumnDocumentReaderAndWriter
is the default DocumentReaderAndWriter
which we use in the distributed models. The options you get with it are different and slightly more limited. You have these two flags:
-mergeTags
will take either plain ("BRAND") or CoNLL-like ("I-BRAND") labels and map them down to a prefix-less IO label ("BRAND") and use that for training and classifying.-iobTags
can take either plain ("BRAND") or CoNLL-like ("I-BRAND") labels and maps them to IOB2.In a sequence model, for any of the labeling schemes like IOB2, the labels are different classes. That is how these labeling schemes work. The special interpretation of "I-", "B-", etc. is left to the human observer and entity-level evaluation software. The included evaluation software will work with IOB1, IOB2, or prefixless IO encoding only.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With