Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to identify ideas and concepts in a given text

I'm working on a project at the moment where it would be really useful to be able to detect when a certain topic/idea is mentioned in a body of text. For instance, if the text contained:

Maybe if you tell me a little more about who Mr Jones is, that would help. It would also be useful if I could have a description of his appearance, or even better a photograph?

It'd be great to be able to detect that the person has asked for a photograph of Mr Jones. I could take a really naïve approach and just look for the word "photo" or "photograph", but this would obviously be no good if they wrote something like:

Please, never send me a photo of Mr Jones.

Does anyone know where to start with this? Is it even possible?

I've looked into things like nltk, but I've yet to find an example of someone doing something similar and am still not entirely sure what this kind of analysis is called. Any help that can get me off the ground would be great.

Thanks!

like image 993
Nick Avatar asked May 17 '10 22:05

Nick


2 Answers

The best thing out there that might be useful to you is automatic sentiment analysis. This is used, for example, to judge whether, say, a customer review is positive or negative. I cannot give you direct pointers to available tools, but this is what you are looking for.

I must say, though, that this is a current hot topic in natural language processing and I’ve seen a number of papers at conferences. It’s definitely quite a complex matter and if you’re starting from scratch, it might take quite some time before you get the results that you want.

like image 152
Ventzi Zhechev Avatar answered Oct 15 '22 02:10

Ventzi Zhechev


NLTK is not a bad framework for parsing natural language but beware that this is not a simple matter. Doing stuff like this is really research level programming.

A good thing that makes it much easier is if you have a very limited domain - say your application focuses on information about famous writers, then you can avoid some complexities of natural language like certain types of ambiguities.

Where to start? Good question. I don't know of any tutorials on the topic (and I presume you tried the Google option) but I'd imagine that iTunes U would have a course on the topic. If not I can post a link to a course I've done that mentions the subject and wasn't completely horrible: http://www.inf.ed.ac.uk/teaching/courses/inf2a/lecturematerials/index.html#lecture01

like image 32
Jakub Hampl Avatar answered Oct 15 '22 01:10

Jakub Hampl