Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Entity Recognition and Sentiment Analysis using NLP

So, this question might be a little naive, but I thought asking the friendly people of Stackoverflow wouldn't hurt.

My current company has been using a third party API for NLP for a while now. We basically URL encode a string and send it over, and they extract certain entities for us (we have a list of entities that we're looking for) and return a json mapping of entity : sentiment. We've recently decided to bring this project in house instead.

I've been studying NLTK, Stanford NLP and lingpipe for the past 2 days now, and can't figure out if I'm basically reinventing the wheel doing this project.

We already have massive tables containing the original unstructured text and another table containing the extracted entities from that text and their sentiment. The entities are single words. For example:

Unstructured text : Now for the bed. It wasn't the best.

Entity : Bed

Sentiment : Negative

I believe that implies we have training data (unstructured text) as well as entity and sentiments. Now how I can go about using this training data on one of the NLP frameworks and getting what we want? No clue. I've sort of got the steps, but not sure:

  1. Tokenize sentences
  2. Tokenize words
  3. Find the noun in the sentence (POS tagging)
  4. Find the sentiment of that sentence.

But that should fail for the case I mentioned above since it talks about the bed in 2 different sentences?

So the question - Does any one know what the best framework would be for accomplishing the above tasks, and any tutorials on the same (Note: I'm not asking for a solution). If you've done this stuff before, is this task too large to take on? I've looked up some commercial APIs but they're absurdly expensive to use (we're a tiny startup).

Thanks stackoverflow!

like image 948
user3457860 Avatar asked Mar 25 '14 20:03

user3457860


2 Answers

OpenNLP may also library to look at. At least they have a small tutuorial to train the name finder and to use the document categorizer to do sentiment analysis. To trtain the name finder you have to prepare training data by taging the entities in your text with SGML tags.

http://opennlp.apache.org/documentation/1.5.3/manual/opennlp.html#tools.namefind.training

like image 118
Kai Mysliwiec Avatar answered Sep 18 '22 15:09

Kai Mysliwiec


NLTK provides a naive NER tagger along with resources. But It doesnt fit into all cases (including finding dates.) But NLTK allows you to modify and customize the NER Tagger according to the requirement. This link might give you some ideas with basic examples on how to customize. Also if you are comfortable with scala and functional programming this is one tool you cannot afford to miss.

Cheers...!

like image 27
Aravind Asok Avatar answered Sep 16 '22 15:09

Aravind Asok