Percentage rank of matches using Levenshtein Distance matching

Tags:

I am trying to match a single search term against a dictionary of possible matches using a Levenshtein distance algorithm. The algorithm returns a distance expressed as number of operations required to convert the search string into the matched string. I want to present the results in ranked percentage list of top "N" (say 10) matches.

Since the search string can be longer or shorter than the individual dictionary strings, what would be an appropriate logic for expressing the distance as a percentage, which would qualitatively refelct how close "as a percentage" is each result to the query string, with 100% indicating an exact match.

I considered the following options:

Q = query string M = matched string PM = Percentage Match Option 1. PMi = (1 - Lev_distance(Q, Mi)/Strlen(Q)) * 100 Option 2. PMi = (1 - Lev_distance(Q, Mi)/max(Strlen(Q), strlen(Mi))) * 100

Option 1 has the possibility of negative percentages in case the distance is greater than the search string length, where the match string is long. For example query "ABC" matched with "ABC Corp." would result in a negative match percent.

Option 2 does not appear to give a consistent percentage across a set of Mi, as each calculation would possibly use a different denominator and hence the resulting percentage values would not be normalized.

Only other way I can think of is to ditch the comparison of the lev_distance to either string lenghts, but instead present the comparitive distances of the top "N" matches as an inverse percentile rank (100-percentile-rank).

Any thoughts? Are there better approaches? I must be missing something as Levenshtein distance is probably the most common algorithm for fuzzy matches and this must be a very common problem.

663

asked May 01 '12 22:05

user1368587

1 Answers

I had a similar problem and this thread helped me to figure out a solution. Hope it can help others too.

int levDis = Lev_distance(Q, Mi) int bigger = max(strlen(Q), strlen(Mi)) double pct = (bigger - levDis) / bigger

It should return 100% if both strings are exactly the same and 0% if they are totaly different.

(sorry if my english isn't that good)

115

answered Oct 02 '22 14:10

Celio Camargo Junior

Related questions
                            
                                Three.js - How can I calculate the distance between two 3D positions?
                            
                                Camera focus distances
                            
                                Calculate distance between two vectors of different length
                            
                                Shortest distance between points on a toroidally wrapped (x- and y- wrapping) map?
                            
                                Parallel distance Matrix in R
                            
                                Distance to the object using stereo camera
                            
                                How to add markers on Google Maps polylines based on distance along the line?
                            
                                Find all coordinates within a circle in geographic data in python
                            
                                MongoDB Bound Queries: How do I convert mile to radian?
                            
                                Algorithm to find point of minimum total distance from locations
                            
                                Find Closest Vector from a List of Vectors | Python
                            
                                How to find my distance to a known location in JavaScript
                            
                                geodesic distance transform in python
                            
                                What is the fastest algorithm to calculate the minimum distance between two sets of points?
                            
                                How to check if a point is inside an ellipsoid?
                            
                                Convert and save distance matrix to a specific format
                            
                                Distance between two android phones
                            
                                Approximate, incremental nearest-neighbour algorithm for moving bodies
                            
                                How do I manipulate/access elements of an instance of "dist" class using core R?
                            
                                How to convert a symmetric matrix into "dist" object?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Percentage rank of matches using Levenshtein Distance matching

Tags:

distance

ranking

levenshtein-distance

percentage

user1368587

People also ask

1 Answers

Celio Camargo Junior

Recent Activity

Donate For Us