Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Find sentences with similar relative meaning from a list of sentences against an example one

I want to be able to find sentences with the same meaning. I have a query sentence, and a long list of millions of other sentences. Sentences are words, or a special type of word called a symbol which is just a type of word symbolizing some object being talked about.

For example, my query sentence is:

Example: add (x) to (y) giving (z)

There may be a list of sentences already existing in my database such as: 1. the sum of (x) and (y) is (z) 2. (x) plus (y) equals (z) 3. (x) multiplied by (y) does not equal (z) 4. (z) is the sum of (x) and (y)

The example should match the sentences in my database 1, 2, 4 but not 3. Also there should be some weight for the sentence matching.

Its not just math sentences, its any sentence which can be compared to any other sentence based upon the meaning of the words. I need some way to have a comparison between a sentence and many other sentences to find the ones with the closes relative meaning. I.e. mapping between sentences based upon their meaning.

Thanks! (the tag is language-design as I couldn't create any new tag)

like image 204
Phil Avatar asked Dec 27 '22 21:12

Phil


2 Answers

First off: what you're trying to solve is a very hard problem. Depending on what's in your dataset, it may be AI-complete.

You'll need your program to know or learn that add, plus and sum refer to the same concept, while multiplies is a different concept. You may be able to do this by measuring distance between the words' synsets in WordNet/FrameNet, though your distance calculation will have to be quite refined if you don't want to find multiplies. Otherwise, you may want to manually establish some word-concept mappings (such as {'add' : 'addition', 'plus' : 'addition', 'sum' : 'addition', 'times' : 'multiplication'}).

If you want full sentence semantics, you will in addition have to parse the sentences and derive the meaning from the parse trees/dependency graphs. The Stanford parser is a popular choice for parsing.

You can also find inspiration for this problem in Question Answering research. There, a common approach is to parse sentences, then store fragments of the parse tree in an index and search for them by common search engines techniques (e.g. tf-idf, as implemented in Lucene). That will also give you a score for each sentence.

like image 102
Fred Foo Avatar answered Dec 30 '22 09:12

Fred Foo


You will need to stem the words in your sentences down to a common synonym, and then compare those stems and use the ratio of stem matches in a sentence (5 out of 10 words) to compare against some threshold that the sentence is a match. For example all sentences with a word match of over 80% (or what ever percentage you deem acurate). At least that is one way to do it.

like image 35
Charles Lambert Avatar answered Dec 30 '22 10:12

Charles Lambert