How to NER and POS tag a pre-tokenized text with Stanford CoreNLP?

Question

I'm using the Stanford's CoreNLP Named Entity Recognizer (NER) and Part-of-Speech (POS) tagger in my application. The problem is that my code tokenizes the text beforehand and then I need to NER and POS tag each token. However I was only able to find out how to do that using the command line options but not programmatically.

Can someone please tell me how programmatically can I NER and POS tag pretokenized text using Stanford's CoreNLP?

Edit:

I'm actually using the individual NER and POS instructions. So my code was written as instructed in the tutorials given in the Stanford's NER and POS packages. But I have CoreNLP in my classpath. So I have the CoreNLP in my classpath but using the tutorials in the NER and POS packages.

Edit:

I just found that there are instructions as how one can set the properties for CoreNLP here http://nlp.stanford.edu/software/corenlp.shtml but I wish if there was a quick way to do what I want with Stanford NER and POS taggers so I don't have to recode everything!

Gabor Angeli · Accepted Answer

If you set the property:

tokenize.whitespace = true

then the CoreNLP pipeline will tokenize on whitespace rather than the default PTB tokenization. You may also want to set:

ssplit.eolonly = true

so that you only split sentences on newline characters.

How to NER and POS tag a pre-tokenized text with Stanford CoreNLP?

Tags:

nlp

stanford-nlp

named-entity-recognition

pos-tagger

Jack Twain

1 Answers

Gabor Angeli

Recent Activity

Donate For Us

How to NER and POS tag a pre-tokenized text with Stanford CoreNLP?

Tags:

nlp

stanford-nlp

named-entity-recognition

pos-tagger

Jack Twain

1 Answers

Gabor Angeli

Related questions

Recent Activity

Donate For Us