Disease named entity recognition

Question

I have a bunch of text documents that describe diseases. Those documents are in most cases quite short and often only contain a single sentence. An example is given here:

Primary pulmonary hypertension is a progressive disease in which widespread occlusion of the smallest pulmonary arteries leads to increased pulmonary vascular resistance, and subsequently right ventricular failure.

What I need is a tool that finds all disease terms (e.g. "pulmonary hypertension" in this case) in the sentences and maps them to a controlled vocabulary like MeSH.

Thanks in advance for your answers!

Pascal · Accepted Answer

Here are two pipelines that are specifically designed for medical document parsing:

Apache cTAKES
NLM's MetaMap

Both use UMLS, the unified medical language system, and thus require that you have a (free) license. Both are Java and more or less easy to set up.

Pierre · Answer

See http://www.ebi.ac.uk/webservices/whatizit/info.jsf

Whatizit is a text processing system that allows you to do textmining tasks on text. The tasks come defined by the pipelines in the drop down list of the above window and the text can be pasted in the text area.

You could also ask biostars: http://www.biostars.org/show/questions/

Vincent Labatut · Answer

there are many tools to do that. some popular ones:

NLTK (python)
LingPipe (java)
Stanford NER (java)
OpenCalais (web service)
Illinois NER (java)

most of them come with some predefined models, i.e. they've already been trained on some general datasets (news articles, etc.). however, your texts are pretty specific, so you might want to first constitute a corpus and re-train one of those tools, in order to adjust it to your data.

more simply, as a first test, you can try a dictionary-based approach: design a list of entity names, and perform some exact or approximate matching. for instance, this operation is decribed in LingPipe's tutorial.

Disease named entity recognition

Tags:

machine-learning

nlp

named-entity-recognition

medical

alex

3 Answers

Pascal

Pierre

Vincent Labatut

Recent Activity

Donate For Us

Disease named entity recognition

Tags:

machine-learning

nlp

named-entity-recognition

medical

alex

3 Answers

Pascal

Pierre

Vincent Labatut

Related questions

Recent Activity

Donate For Us