Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there an algorithm that tells the semantic similarity of two phrases

input: phrase 1, phrase 2

output: semantic similarity value (between 0 and 1), or the probability these two phrases are talking about the same thing

like image 636
btw0 Avatar asked Sep 15 '08 12:09

btw0


People also ask

How do you find the semantic similarity between two sentences?

The easiest way of estimating the semantic similarity between a pair of sentences is by taking the average of the word embeddings of all words in the two sentences, and calculating the cosine between the resulting embeddings.

How do you evaluate semantic similarity?

We classify semantic similarity measures in three categories: path-based measures which rely on the hierarchical relations between the terms in a taxonomy; corpus-based information content (IC) measures which augment the path information with probabilities derived from a corpus of text; and taxonomy-based IC measures ...

What is similarity algorithm?

Similarity algorithms compute the similarity of pairs of nodes based on their neighborhoods or their properties. Several similarity metrics can be used to compute a similarity score. The Neo4j GDS library includes the following similarity algorithms: Node Similarity. Filtered Node Similarity.


3 Answers


You might want to check out this paper:

Sentence similarity based on semantic nets and corpus statistics (PDF)

I've implemented the algorithm described. Our context was very general (effectively any two English sentences) and we found the approach taken was too slow and the results, while promising, not good enough (or likely to be so without considerable, extra, effort).

You don't give a lot of context so I can't necessarily recommend this but reading the paper could be useful for you in understanding how to tackle the problem.

Regards,

Matt.

like image 193
Matt Mower Avatar answered Sep 21 '22 07:09

Matt Mower


There's a short and a long answer to this.

The short answer:

Use the WordNet::Similarity Perl package. If Perl is not your language of choice, check the WordNet project page at Princeton, or google for a wrapper library.

The long answer:

Determining word similarity is a complicated issue, and research is still very hot in this area. To compute similarity, you need an appropriate represenation of the meaning of a word. But what would be a representation of the meaning of, say, 'chair'? In fact, what is the exact meaning of 'chair'? If you think long and hard about this, it will twist your mind, you will go slightly mad, and finally take up a research career in Philosophy or Computational Linguistics to find the truth™. Both philosophers and linguists have tried to come up with an answer for literally thousands of years, and there's no end in sight.

So, if you're interested in exploring this problem a little more in-depth, I highly recommend reading Chapter 20.7 in Speech and Language Processing by Jurafsky and Martin, some of which is available through Google Books. It gives a very good overview of the state-of-the-art of distributional methods, which use word co-occurrence statistics to define a measure for word similarity. You are not likely to find libraries implementing these, however.

like image 39
nfelger Avatar answered Sep 24 '22 07:09

nfelger


For anyone just coming at this, i would suggest taking a look at SEMILAR - http://www.semanticsimilarity.org/ . They implement a lot of the modern research methods for calculating word and sentence similarity. It is written in Java.

SEMILAR API comes with various similarity methods based on Wordnet, Latent Semantic Analysis (LSA), Latent Dirichlet Allocation (LDA), BLEU, Meteor, Pointwise Mutual Information (PMI), Dependency based methods, optimized methods based on Quadratic Assignment, etc. And the similarity methods work in different granularities - word to word, sentence to sentence, or bigger texts.

like image 9
kyrenia Avatar answered Sep 20 '22 07:09

kyrenia