Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Should I use LingPipe or NLTK for extracting names and places?

Tags:

nlp

nltk

lingpipe

I'm looking to extract names and places from very short bursts of text example

 "cardinals vs jays in toronto"
 " Daniel Nestor and Nenad Zimonjic play Jonas Bjorkman w/ Kevin Ullyett, paris time to be announced"
"jenson button - pole position, brawn-mercedes - monaco".

This data is currently in a MySQL database, and I (pretty much) have a separate record for each athlete, though names are sometimes spelled wrong, etc.

I would like to extract the athletes and locations. I usually work in PHP, but haven't been able to find a library for entity extraction (and I may want to get deeper into some NLP and ML in the future).

From what I've found, LingPipe and NLTK seem to be the most recommended, but I can't figure out if either will really suit my purpose, or if something else would be better.

I haven't programmed in either Java or Python, so before I start learning new languages, I'm hoping to get some advice on what route I should follow, or other recommendations.

like image 599
pedalpete Avatar asked Oct 31 '09 22:10

pedalpete


People also ask

How do you do a named entity recognition?

So first, we need to create entity categories, like Name, Location, Event, Organization, etc., and feed a NER model relevant training data. Then, by tagging some samples of words and phrases with their corresponding entities, we'll eventually teach our NER model to detect the entities and categorize them.

What is GPE in NLTK?

The GPE is a Tree object's label from the pre-trained ne_chunk model.


1 Answers

What you're describing is named entity recognition. So I'd recommend checking out the other questions regarding this topic if you haven't already seen them. This looks like the most useful answer to me.

I can't really comment about whether NLTK or LingPipe is best suited for this task although from looking at the answers it looks like there's quite a few other resources written in Java.

One advantage of going with NLTK is that Python is very accessible as a language. The other advantage is that the NLTK book (which is available for free) offers an introduction to both Python and NLTK at the same time, which would be useful for you.

like image 107
nedned Avatar answered Oct 12 '22 23:10

nedned