I would like to use named entity recognition (NER) to find adequate tags for texts in a database.
I know there is a Wikipedia article about this and lots of other pages describing NER, I would preferably hear something about this topic from you:
Example:
"Last year, I was in London where I saw Barack Obama." => Tags: London, Barack Obama
I hope you can help me. Thank you very much in advance!
As stated above, Named Entity Recognition must both identify and categorize this information. There are two main models used to achieve this goal: Ontology-based models and Deep Learning-based models.
When we read a text, we naturally recognize named entities like people, values, locations, and so on. For example, in the sentence “Mark Zuckerberg is one of the founders of Facebook, a company from the United States” we can identify three types of entities: “Person”: Mark Zuckerberg. “Company”: Facebook.
The NER model is one of a number of methods for determining the accuracy of live subtitles in television broadcasts and events that are produced using speech recognition. The three letters stand for number, edition error and recognition error.
To start with check out http://www.nltk.org/ if you plan working with python although as far as I know the code isn't "industrial strength" but it will get you started.
Check out section 7.5 from http://nltk.googlecode.com/svn/trunk/doc/book/ch07.html but to understand the algorithms you probably will have to read through a lot of the book.
Also check this out http://nlp.stanford.edu/software/CRF-NER.shtml. It's done with java,
NER isn't an easy subject and probably nobody will tell you "this is the best algorithm", most of them have their pro/cons.
My 0.05 of a dollar.
Cheers,
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With