How to set tokenizer options when using the simple CoreNLP API?

Question

I'm aware of the tokenizer options that are available in CoreNLP and I know how to set them in the standard version.

Is there way to pass the options, e.g. the untokenizable=noneKeep, when using the Simple CoreNLP interfaces?

StanfordNLPHelp · Accepted Answer

You can build a Document with properties.

package edu.stanford.nlp.examples;

import edu.stanford.nlp.simple.*;

import java.util.*;

public class SimpleExample {

    public static void main(String[] args) {
        Properties props = new Properties();
        props.setProperty("tokenize.options", "untokenizable=allKeep");
        Document doc = new Document(props, "Joe Smith was born in California.  He moved to Chicago last year.");
        for (Sentence sent : doc.sentences()) {
            System.out.println(sent.tokens());
            System.out.println(sent.nerTags());
            System.out.println(sent.parse());
        }
    }

}

How to set tokenizer options when using the simple CoreNLP API?

Tags:

java

stanford-nlp

The Coding Monk

1 Answers

StanfordNLPHelp

Recent Activity

Donate For Us

How to set tokenizer options when using the simple CoreNLP API?

Tags:

java

stanford-nlp

The Coding Monk

1 Answers

StanfordNLPHelp

Related questions

Recent Activity

Donate For Us