Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Disease named entity recognition

I have a bunch of text documents that describe diseases. Those documents are in most cases quite short and often only contain a single sentence. An example is given here:

Primary pulmonary hypertension is a progressive disease in which widespread occlusion of the smallest pulmonary arteries leads to increased pulmonary vascular resistance, and subsequently right ventricular failure.

What I need is a tool that finds all disease terms (e.g. "pulmonary hypertension" in this case) in the sentences and maps them to a controlled vocabulary like MeSH.

Thanks in advance for your answers!

like image 829
alex Avatar asked Sep 25 '12 08:09

alex


3 Answers

Here are two pipelines that are specifically designed for medical document parsing:

  • Apache cTAKES
  • NLM's MetaMap

Both use UMLS, the unified medical language system, and thus require that you have a (free) license. Both are Java and more or less easy to set up.

like image 193
Pascal Avatar answered Sep 22 '22 04:09

Pascal


See http://www.ebi.ac.uk/webservices/whatizit/info.jsf

Whatizit is a text processing system that allows you to do textmining tasks on text. The tasks come defined by the pipelines in the drop down list of the above window and the text can be pasted in the text area.

You could also ask biostars: http://www.biostars.org/show/questions/

like image 38
Pierre Avatar answered Sep 20 '22 04:09

Pierre


there are many tools to do that. some popular ones:

  • NLTK (python)
  • LingPipe (java)
  • Stanford NER (java)
  • OpenCalais (web service)
  • Illinois NER (java)

most of them come with some predefined models, i.e. they've already been trained on some general datasets (news articles, etc.). however, your texts are pretty specific, so you might want to first constitute a corpus and re-train one of those tools, in order to adjust it to your data.

more simply, as a first test, you can try a dictionary-based approach: design a list of entity names, and perform some exact or approximate matching. for instance, this operation is decribed in LingPipe's tutorial.

like image 43
Vincent Labatut Avatar answered Sep 21 '22 04:09

Vincent Labatut