Normalizing the edit distance

Tags:

I have a question that can we normalize the levenshtein edit distance by dividing the e.d value by the length of the two strings? I am asking this because, if we compare two strings of unequal length, the difference between the lengths of the two will be counted as well. for eg: ed('has a', 'has a ball') = 4 and ed('has a', 'has a ball the is round') = 15. if we increase the length of the string, the edit distance will increase even though they are similar. Therefore, I can not set a value, what a good edit distance value should be.

328

asked Aug 20 '17 14:08

Naufal Khalid

1 Answers

Yes, normalizing the edit distance is one way to put the differences between strings on a single scale from "identical" to "nothing in common".

A few things to consider:

Whether or not the normalized distance is a better measure of similarity between strings depends on the application. If the question is "how likely is this word to be a misspelling of that word?", normalization is a way to go. If it's "how much has this document changed since the last version?", the raw edit distance may be a better option.
If you want the result to be in the range [0, 1], you need to divide the distance by the maximum possible distance between two strings of given lengths. That is, length(str1)+length(str2) for the LCS distance and max(length(str1), length(str2)) for the Levenshtein distance.
The normalized distance is not a metric, as it violates the triangle inequality.

172

answered Nov 10 '22 12:11

Anton

Related questions
                            
                                Get border edges of mesh - in winding order
                            
                                Time elapsed between two functions
                            
                                Efficiently build a graph of words with given Hamming distance
                            
                                Comparing unordered_map vs unordered_set
                            
                                Scanning images for finding rectangles
                            
                                Algorithm to share/settle expenses among a group
                            
                                C# hashcode for array of ints
                            
                                Should developers know discrete math? [closed]
                            
                                Algorithm for pow(float, float)
                            
                                Which algorithm is used for noise canceling in earphones?
                            
                                Is Dijkstra's algorithm, dynamic programming
                            
                                SURF and SIFT Alternative Object Tracking Algorithm for Augmented Reality
                            
                                Measuring the average thickness of traces in an image
                            
                                Finding maximum size sub-matrix of all 1's in a matrix having 1's and 0's
                            
                                Most efficient way of erasing/deleting multiple std::vector elements while retaining original order?
                            
                                Sudoku validity check algorithm - how does this code works?
                            
                                C++ Design Pattern for Passing a Large Number of Parameters
                            
                                Hadoop gzip compressed files
                            
                                Why is bubble sort O(n^2)?
                            
                                How to find runtime efficiency of a C++ code

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Normalizing the edit distance

Tags:

string-matching

algorithm

ranking

levenshtein-distance

edit-distance

Naufal Khalid

People also ask

1 Answers

Anton

Recent Activity

Donate For Us