Annotated Training data for NER corpus

Question

It is mentioned in the documentation of opennlp that we've to train our model with 15000 line for a good performance. now, I've to extract different entities from the document which means I've to add different tags for many tokens in the training data(15000 lines) which will take a lot of time. Is there any other way to do this? which will reduce the time or any other method which I can proceed.

Thanks.

smoothsipai · Accepted Answer

Here are some tools:

GATE http://gate.ac.uk/

GATE Teamware (web-based) http://gate.ac.uk/teamware/

XConc Suite http://www-tsujii.is.s.u-tokyo.a...

Sapient (sentence-based) http://www.aber.ac.uk/en/cs/rese...

Knowtator (Protégé plug-in) http://knowtator.sourceforge.net/

CorpusTool http://www.wagsoft.com/CorpusToo...

UIMA CAS Editor http://uima.apache.org/

Callisto http://callisto.mitre.org/

Wordfreak http://wordfreak.sourceforge.net/

MMax2 http://mmax2.sourceforge.net/

reference: https://www.quora.com/Natural-Language-Processing-What-are-the-best-tools-for-manually-annotating-a-text-corpus-with-entities-and-relationships

David Batista · Answer

This one is also worth trying:

brat rapid annotation tool

I've used it myself and recommend it.

Annotated Training data for NER corpus

Tags:

nlp

named-entity-recognition

training-data

corpus

opennlp

2 Answers

smoothsipai

David Batista

Recent Activity

Donate For Us

Annotated Training data for NER corpus

Tags:

nlp

named-entity-recognition

training-data

corpus

opennlp

2 Answers

smoothsipai

David Batista

Related questions

Recent Activity

Donate For Us