Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Reverse Levenshtein distance

In levenshtein distance you ask the question, given these two strings, what is their levenshtein distance. How would you go about taking a string and a levenshtein distance and generating all the strings within that levenshtein distance. (It would also take in a character set). So if i pass in a string x and a distance d. then it would give me all the strings within that edit distance, including d-1 and d-2....d-n; (n < d).

Expected functionality:

>>> getWithinDistance('apple',2,{'a','b',' '})
['applea','appleb','appel','app le'...]

Please note that the program is able to produce app le as space is included in the character set.

like image 779
Anshu Dwibhashi Avatar asked Nov 27 '13 10:11

Anshu Dwibhashi


1 Answers

There's a data structure that does this called the Levenshtein automaton. You construct it from a set of strings (which may have only one member) and a fixed distance k, and then you can query it for all strings with distance at most k of any of the strings it stores. A Python implementation is discussed here.

Alternatively, you can do a depth-limited search with backtracking for such strings.

like image 135
Fred Foo Avatar answered Sep 19 '22 18:09

Fred Foo