Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Algorithm to compare similarity of English sentences

Tags:

algorithm

I have a collection of sentences, and I need to analyse them to see how similar they are.

Are there any established algorithms to do this?

I care about:

  • containing the same words (ignoring inflexions for now)
  • containing the same words in a similar order

I've used Levenshtein distance and n-grams for spelling before, although I'm not entirely confident if these translate to my purposes.

Naively, "I don't care about spelling differences, typos can be treated as different words" although perhaps it would be nice to account for this.

perhaps some hybrid of splitting the sentence at spaces and one of the above (or other) algorithms would be a starting point

What options are available? Any advice?

Thanks!

like image 893
Andrew Bullock Avatar asked Jul 15 '11 08:07

Andrew Bullock


People also ask

What is similarity algorithm?

Similarity algorithms compute the similarity of pairs of nodes based on their neighborhoods or their properties. Several similarity metrics can be used to compute a similarity score.

How do you find the similarity between two texts?

The simplest way to compute the similarity between two documents using word embeddings is to compute the document centroid vector. This is the vector that's the average of all the word vectors in the document.

How do you use similarity in a sentence?

The books share a similarity of ideas. I see a lot of similarities in them. Looking at these fossils, I see some similarity to modern-day birds. I see very little similarity between your situation and his.


1 Answers

This paper compares several sentence similarity measures. Perhaps you can use one of them as is, or modify it for your needs.

Otherwise sentence similarity measure is a good key term to google for.

like image 172
Szabolcs Avatar answered Oct 31 '22 03:10

Szabolcs