I'm aware of the tokenizer options that are available in CoreNLP and I know how to set them in the standard version.
Is there way to pass the options, e.g. the untokenizable=noneKeep
, when using the Simple CoreNLP interfaces?
You can build a Document with properties.
package edu.stanford.nlp.examples;
import edu.stanford.nlp.simple.*;
import java.util.*;
public class SimpleExample {
public static void main(String[] args) {
Properties props = new Properties();
props.setProperty("tokenize.options", "untokenizable=allKeep");
Document doc = new Document(props, "Joe Smith was born in California. He moved to Chicago last year.");
for (Sentence sent : doc.sentences()) {
System.out.println(sent.tokens());
System.out.println(sent.nerTags());
System.out.println(sent.parse());
}
}
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With