Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Semi-automatic annotation tool - How to find RDF Triplets

I'm developing a semi-automatic annotation tool for medical texts and I am completely lost in finding the RDF triplets for annotation.

I am currently trying to use an NLP based approach. I have already looked into Stanford NER and OpenNLP and they both do not have models for extracting disease names.

My question is: * How can I create a new NER model for extracting disease names? and can I get any help from the OpenNLP or Standford NERs? * Is there another approach all-together - other than NLP - to extracting the RDF triplets from a text?

Any help would be appreciated! Thanks.

like image 388
Gavin Spencer Avatar asked Apr 28 '12 21:04

Gavin Spencer


1 Answers

I have done something similar to what you need both with OpenNLP and LingPipe. I found the exact dictionary-based chunking of LingPipe good enough for my use case and used that. Documentation available here: http://alias-i.com/lingpipe/demos/tutorial/ne/read-me.html

You can find a small demo here:

  • https://github.com/castagna/nerdf

If a gazetteer/dictionary approach isn't good enough for you, you can try creating your own model, OpenNLP has API for training models as well. Documentation is here: http://opennlp.apache.org/documentation/1.5.2-incubating/manual/opennlp.html#tools.namefind.training

Extracting RDF triples from natural language is a different problem than identify named entities. NER is a related and perhaps necessary step, but not enough. To extract an RDF statement from natural language not only you need to identify entities such as the subject and the object of a statement. But you also need to identify the verb and/or relationship of those entities and also you need to map those to URIs.

like image 86
castagna Avatar answered Oct 17 '22 15:10

castagna