Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

stanford nlp tokenizer

How can i tokenize a string in java class using stanford parser?

I am only able to find examples of documentProcessor and PTBTokenizer taking text from external file.

 DocumentPreprocessor dp = new DocumentPreprocessor("hello.txt");
   for (List sentence : dp) {
    System.out.println(sentence);
  }
  // option #2: By token

   PTBTokenizer ptbt = new PTBTokenizer(new FileReader("hello.txt"),
          new CoreLabelTokenFactory(), "");
  for (CoreLabel label; ptbt.hasNext(); ) {
    label = (CoreLabel) ptbt.next();
    System.out.println(label);
  }

Thanks.

like image 333
Naveen Avatar asked Oct 11 '12 20:10

Naveen


1 Answers

PTBTokenizer constructor takes a java.io.Reader, then you can use a StringReader to parse your text

like image 142
CapelliC Avatar answered Oct 11 '22 16:10

CapelliC