Hamming Distance vs. Levenshtein Distance

Tags:

For the problem I'm working on, finding distances between two sequences to determine their similarity, sequence order is very important. However, the sequences that I have are not all the same length, so I pad any deficient strings with empty points such that both sequences are the same length in order to satisfy the Hamming distance requirement. Is there any major problem with me doing this, since all I care about are the number of transpositions (not insertions or deletions like Levenshtein does)?

I've found that Hamming distance is much, much faster than Levenshtein as a distance metric for sequences of longer length. When should one use Levenshtein distance (or derivatives of Levenshtein distance) instead of the much cheaper Hamming distance? Hamming distance can be considered the upper bound for possible Levenshtein distances between two sequences, so if I am comparing the two sequences for a order-biased similarity metric rather than the absolute minimal number of moves to match the sequences, there isn't an apparent reason for me to choose Levenshtein over Hamming as a metric, is there?

202

asked Jan 03 '11 21:01

don

2 Answers

That question really depends on the types of sequences you are matching, and what result you want.

If it's not a problem that "1234567890" and "0123456789" are considered totally different, indeed Hamming distance is fine.

180

answered Oct 02 '22 09:10

Johan Kotlinski

In addition to the right Johan answer, the padding can be problematic.

For example, when you compare 123 to 123456 it's different if you pad either at the end of the string or at the start of the string. The similarity of ___123 with 123456 is 0, but The similarity of 123___ with 123456 is 3.

answered Oct 02 '22 07:10

David Weinberg

Related questions
                            
                                Efficient way to search a stream for a string
                            
                                Algorithm challenge: Generate color scheme from an image
                            
                                Why does Python's itertools.permutations contain duplicates? (When the original list has duplicates)
                            
                                Mastering Recursive Programming [closed]
                            
                                Difference between AVL trees and splay trees
                            
                                Given two arrays, find the permutations that give closest distance between two arrays
                            
                                How do you find a point at a given perpendicular distance from a line?
                            
                                When should we use Radix sort?
                            
                                Coupon code generation
                            
                                Why is the constant always dropped from big O analysis?
                            
                                What is the best way to find all combinations of items in an array?
                            
                                Division without using '/'
                            
                                How do I efficiently determine if a polygon is convex, non-convex or complex?
                            
                                math/algorithm Fit image to screen retain aspect ratio
                            
                                fast algorithm for drawing filled circles?
                            
                                Find 2 numbers in an unsorted array equal to a given sum
                            
                                Why is the size 127 (prime) better than 128 for a hash-table?
                            
                                Non-intersecting line segments while minimizing the cumulative length
                            
                                How to count each digit in a range of integers?
                            
                                Collision detection of huge number of circles

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Hamming Distance vs. Levenshtein Distance

Tags:

algorithm

diff

nlp

levenshtein-distance

hamming-distance

don

People also ask

2 Answers

Johan Kotlinski

David Weinberg

Recent Activity

Donate For Us