If so, please explain how.
Re: what is distance -- "The distance between two strings is defined as the minimal number of edits required to convert one into the other."
For example, xyz to XYZ would take 3 edits, so the string xYZ is closer to XYZ and xyz.
If the pattern is [0-9]{3} or for instance 123, then a23 would be closer to the pattern than ab3.
How can you find the shortest distance between a regexp and a non-matching string?
The above is the Damerau–Levenshtein distance algorithm.
The Levenshtein distance between two strings is the number of single character deletions, insertions, or substitutions required to transform one string into the other. This is also known as the edit distance. Vladimir Levenshtein is a Russian mathematician who published this notion in 1966.
Different definitions of an edit distance use different sets of string operations. Levenshtein distance operations are the removal, insertion, or substitution of a character in the string. Being the most common metric, the term Levenshtein distance is often used interchangeably with edit distance.
There are several ways to measure the distance between two strings. The simplest one is to use hamming distance to find the number of mismatch between two strings. However, the two strings must have the same length.
You can use Finite State Machines to do this efficiently (that is, linear in time). If you use a transducer, you can even write the specification of the transformation fairly compactly and do far more nuanced transformations than simply inserts or deletes - see wikipedia for Finite State Transducer as a starting point, and software such as the FSA toolkit or FSA6 (which has a not entirely stable web-demo) too. There are lots of libraries for FSA manipulation; I don't want to suggest the previous two are your only or best options, just two I've heard of.
If, however, you merely want the efficient, approximate searching, a less flexibly but already-implemented-for-you option exists: TRE, which has an approximate matching function that returns the cost of the match - i.e., the distance to the match, from your perspective.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With