Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Annotated Training data for NER corpus

It is mentioned in the documentation of opennlp that we've to train our model with 15000 line for a good performance. now, I've to extract different entities from the document which means I've to add different tags for many tokens in the training data(15000 lines) which will take a lot of time. Is there any other way to do this? which will reduce the time or any other method which I can proceed.

Thanks.


2 Answers

Here are some tools:

GATE http://gate.ac.uk/

GATE Teamware (web-based) http://gate.ac.uk/teamware/

XConc Suite http://www-tsujii.is.s.u-tokyo.a...

Sapient (sentence-based) http://www.aber.ac.uk/en/cs/rese...

Knowtator (Protégé plug-in) http://knowtator.sourceforge.net/

CorpusTool http://www.wagsoft.com/CorpusToo...

UIMA CAS Editor http://uima.apache.org/

Callisto http://callisto.mitre.org/

Wordfreak http://wordfreak.sourceforge.net/

MMax2 http://mmax2.sourceforge.net/

reference: https://www.quora.com/Natural-Language-Processing-What-are-the-best-tools-for-manually-annotating-a-text-corpus-with-entities-and-relationships

like image 142
smoothsipai Avatar answered Oct 30 '25 21:10

smoothsipai


This one is also worth trying:

brat rapid annotation tool

I've used it myself and recommend it.

like image 42
David Batista Avatar answered Oct 30 '25 23:10

David Batista