Is there an edit distance such as Levenshtein which takes into account distance for substitutions? For example, if we would consider if words are equal, <code>typo</code> and <code>tylo</code> are really close (<code>p</code> and <code>l</code> are physically close on the keyboard), while <code>typo</code> and <code>tyqo</code> are far apart. I'd like to allocate a smaller distance to more likely typos. There must be a metric that takes this kind of promixity into account?

the kind of distance you ask is not included in levenshtein - but you should use a helper like euclidean or manhattan distance, to get the result.my simple assumption is, q (in english qwerty layout) is cartesian (y=0; x=0) so, w will be (y=0; x=1) and so on. whole list here <pre class="prettyprint"><code>keyboard_cartesian= { 'q': {'y': 0, 'x': 0}, 'w': {'y': 0, 'x': 1}, 'e': {'y': 0, 'x': 2}, 'r': {'y': 0, 'x': 3}, # ... 'a': {'y': 1, 'x': 0}, #... 'z': {'y': 2, 'x': 0}, 'x' : {'x':1, 'y':2}, # } </code></pre> assume, word qaz has a meaning. levenshtein distance between <code>qaz</code> and with both of <code>waz</code> and <code>eaz</code> is 1. to check out which misspell is more likely, take the differences (here (q,w) and (q,e)) and calculate euclidean distance <pre class="prettyprint"><code>>>> from math import * >>> def euclidean_distance(a,b): ... X = (keyboard_cartesian[a]['x']-keyboard_cartesian[b]['x'])**2 ... Y = (keyboard_cartesian[a]['y']-keyboard_cartesian[b]['y'])**2 ... return sqrt(X+Y) ... >>> euclidean_distance('q', 'w') 1.0 >>> euclidean_distance('q', 'e') 2.0 </code></pre> this means misspell of qaz as waz is more likley than qaz as eaz.

Edit distance such as Levenshtein taking into account proximity on keyboard

Tags:

Is there an edit distance such as Levenshtein which takes into account distance for substitutions?

For example, if we would consider if words are equal, typo and tylo are really close (p and l are physically close on the keyboard), while typo and tyqo are far apart. I'd like to allocate a smaller distance to more likely typos.

There must be a metric that takes this kind of promixity into account?

260

asked Mar 24 '15 13:03

PascalVKooten

1 Answers

the kind of distance you ask is not included in levenshtein - but you should use a helper like euclidean or manhattan distance, to get the result.my simple assumption is, q (in english qwerty layout) is cartesian (y=0; x=0) so, w will be (y=0; x=1) and so on. whole list here

keyboard_cartesian= {                      'q': {'y': 0, 'x': 0},                      'w': {'y': 0, 'x': 1},                      'e': {'y': 0, 'x': 2},                         'r': {'y': 0, 'x': 3},                           # ...                      'a': {'y': 1, 'x': 0},                        #...                      'z': {'y': 2, 'x': 0},                      'x' : {'x':1, 'y':2},                       #                         }

assume, word qaz has a meaning. levenshtein distance between qaz and with both of waz and eaz is 1. to check out which misspell is more likely, take the differences (here (q,w) and (q,e)) and calculate euclidean distance

>>> from math import * >>> def euclidean_distance(a,b): ...     X = (keyboard_cartesian[a]['x']-keyboard_cartesian[b]['x'])**2 ...     Y = (keyboard_cartesian[a]['y']-keyboard_cartesian[b]['y'])**2 ...     return sqrt(X+Y) ...  >>> euclidean_distance('q', 'w') 1.0  >>> euclidean_distance('q', 'e') 2.0

this means misspell of qaz as waz is more likley than qaz as eaz.

120

answered Oct 04 '22 02:10

marmeladze

Related questions
                            
                                r markdown - format text in code chunk with new lines
                            
                                Sorting using Comparator Interface and java 8 Streams
                            
                                Can I update Visual Studio Community 2015 RC to 2015 Release when the final version is released, (without reinstallation)?
                            
                                mongo 3 duplicates on unique index - dropDups
                            
                                Get frame height without navigation bar height and tab bar height in deeper view hierarchy
                            
                                Visual Studio native unit testing: Debug/console output?
                            
                                Linux "free -m": Total, used and free memory values don't add up [closed]
                            
                                How to add open files tab on distraction free mode
                            
                                Android OkHttp, refresh expired token
                            
                                numpy second derivative of a ndimensional array
                            
                                How to choose columns when creating index?
                            
                                How to remove dialog margins?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With