Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Word-level edit distance of a sentence

Tags:

Is there an algorithm that lets you find the word-level edit distance between 2 sentences? For eg., "A Big Fat Dog" and "The Big House with the Fat Dog" have 1 substitute, 3 insertions

like image 464
AutoC Avatar asked Feb 20 '11 07:02

AutoC


People also ask

What is the type of edit distance?

Types of edit distanceThe Levenshtein distance allows deletion, insertion and substitution. The longest common subsequence (LCS) distance allows only insertion and deletion, not substitution. The Hamming distance allows only substitution, hence, it only applies to strings of the same length.

What is the algorithm to find the edit distance between two words?

The Levenshtein distance is a string metric for measuring difference between two sequences. Informally, the Levenshtein distance between two words is the minimum number of single-character edits (i.e. insertions, deletions or substitutions) required to change one word into the other.

What is the edit distance problem?

The edit distance problem is the minimum number of insertions, deletions, or replacements required to convert one string to another. What is the time and space complexity of the dynamic programming approach? The time and space complexity of the dynamic programming approach is O(N * M)


1 Answers

In general, this is called the sequence alignment problem. Actually it does not matter what entities you align - bits, characters, words, or DNA bases - as long as the algorithm works for one type of items it will work for everything else. What matters is whether you want global or local alignment.

Global alignment, which attempt to align every residue in every sequence, is most useful when the sequences are similar and of roughly equal size. A general global alignment technique is the Needleman-Wunsch algorithm algorithm, which is based on dynamic programming. When people talk about Levinstain distance they usually mean global alignment. The algorithm is so straightforward, that several people discovered it independently, and sometimes you may come across Wagner-Fischer algorithm which is essentially the same thing, but is mentioned more often in the context of edit distance between two strings of characters.

Local alignment is more useful for dissimilar sequences that are suspected to contain regions of similarity or similar sequence motifs within their larger sequence context. The Smith-Waterman algorithm is a general local alignment method also based on dynamic programming. It is quite rarely used in natural language processing, and more often - in bioinformatics.

like image 80
Alexander Solovets Avatar answered Sep 21 '22 06:09

Alexander Solovets