Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to set tokenizer options when using the simple CoreNLP API?

I'm aware of the tokenizer options that are available in CoreNLP and I know how to set them in the standard version.

Is there way to pass the options, e.g. the untokenizable=noneKeep, when using the Simple CoreNLP interfaces?

like image 579
The Coding Monk Avatar asked Nov 07 '22 22:11

The Coding Monk


1 Answers

You can build a Document with properties.

package edu.stanford.nlp.examples;

import edu.stanford.nlp.simple.*;

import java.util.*;

public class SimpleExample {

    public static void main(String[] args) {
        Properties props = new Properties();
        props.setProperty("tokenize.options", "untokenizable=allKeep");
        Document doc = new Document(props, "Joe Smith was born in California.  He moved to Chicago last year.");
        for (Sentence sent : doc.sentences()) {
            System.out.println(sent.tokens());
            System.out.println(sent.nerTags());
            System.out.println(sent.parse());
        }
    }

}
like image 79
StanfordNLPHelp Avatar answered Nov 14 '22 23:11

StanfordNLPHelp