Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Comparing sentences according to their meaning

Python provides the NLTK library which is a vast resource of text and corpus, along with a slew of text mining and processing methods. Is there any way we can compare sentences based on the meaning they convey for a possible match? That is, an intelligent sentence matcher?

For example, a sentence like giggling at bad jokes and I like to laugh myself silly at poor jokes. Both convey the same meaning, but the sentences don't remotely match (words are different, Levenstein Distance would fail badly!).

Now imagine we have an API which exposes functionality such as found here. So based on that, we have mechanisms to find out that the word giggle and laugh do match in the meaning they convey. Bad won't match up to poor, so we may need to add further layers (like they match in the context of words like joke, since bad joke is generally same as poor joke, although bad person is not same as poor person!).

A major challenge would be to discard stuff that don't much alter the meaning of the sentence. So, the algorithm should return the same degree of matchness between the the first sentence and this: I like to laugh myself silly at poor jokes, even though they are completely senseless, full of crap and serious chances of heart-attack!

So with that available, is there any algorithm like this that has been conceived yet? Or do I have to invent the wheel?

like image 261
SexyBeast Avatar asked Feb 13 '13 11:02

SexyBeast


1 Answers

You will need a more advanced topic modeling algorithm, and of course some corpora to train your model, so that you can easily handle synonyms like giggle and laugh !

In python, you can try this package : http://radimrehurek.com/gensim/ I never used it but it includes classic semantic vector spaces methods like lsa/lsi, random projection and even lda.

My personal favourite is random projection, because it is faster and still very efficient (I'm doing it in java with another library though).

like image 69
bendaizer Avatar answered Oct 22 '22 04:10

bendaizer