I'm looking to compute the the Levenshtein-distance between sequences containing up to 6 values. The order of these values should not affect the distance. How would I implement this into the iterative or recursive algorithm? Example: <pre class="prettyprint"><code># Currently >>> LDistance('dog', 'god') 2 # Sorted >>> LDistance('dgo', 'dgo') 0 # Proposed >>> newLDistance('dog', 'god') 0 </code></pre> 'dog' and 'god' have the exact same letters, sorting the strings before hand will return the desired result. However this doesn't work all the time: <pre class="prettyprint"><code># Currently >>> LDistance('doge', 'gold') 3 # Sorted >>> LDistance('dego', 'dglo') 2 # Proposed >>> newLDistance('doge', 'gold') 1 </code></pre> 'doge' and 'gold' have 3/4 matching letters and so should return a distance of 1. Here is my current recursive code: <pre class="prettyprint"><code>def mLD(s, t): memo = {} def ld(s, t): if not s: return len(t) if not t: return len(s) if s[0] == t[0]: return ld(s[1:], t[1:]) if (s, t) not in memo: l1 = ld(s, t[1:]) l2 = ld(s[1:], t) l3 = ld(s[1:], t[1:]) memo[(s,t)] = 1 + min(l1, l2, l3) return memo[(s,t)] return ld(s, t) </code></pre> EDIT: Followup question: Adding exceptions to Levenshtein-Distance-like algorithm

Why not just count how many letters are in common, and find and answer from this? For each character calculate its frequency, then for each string calculate how many "extra" characters it has based on frequencies, and take maximum of these "extra". Pseudocode: <pre class="prettyprint"><code>for c in s1: cnt1[c]++ for c in s2: cnt2[c]++ extra1 = 0 extra2 = 0 for c in all_chars: if cnt1[c]>cnt2[c] extra1 += cnt1[c]-cnt2[c] else extra2 += cnt2[c]-cnt1[c] return max(extra1, extra2) </code></pre>

Modify Levenshtein-Distance to ignore order

Tags:

python

algorithm

levenshtein-distance

edit-distance

I'm looking to compute the the Levenshtein-distance between sequences containing up to 6 values. The order of these values should not affect the distance.

How would I implement this into the iterative or recursive algorithm?

Example:

# Currently 
>>> LDistance('dog', 'god')
2

# Sorted
>>> LDistance('dgo', 'dgo')
0

# Proposed
>>> newLDistance('dog', 'god')
0

'dog' and 'god' have the exact same letters, sorting the strings before hand will return the desired result. However this doesn't work all the time:

# Currently 
>>> LDistance('doge', 'gold')
3

# Sorted
>>> LDistance('dego', 'dglo')
2

# Proposed
>>> newLDistance('doge', 'gold')
1

'doge' and 'gold' have 3/4 matching letters and so should return a distance of 1. Here is my current recursive code:

def mLD(s, t):
    memo = {}
    def ld(s, t):
        if not s: return len(t)
        if not t: return len(s)
        if s[0] == t[0]: return ld(s[1:], t[1:])
        if (s, t) not in memo:
            l1 = ld(s, t[1:])
            l2 = ld(s[1:], t)
            l3 = ld(s[1:], t[1:])
            memo[(s,t)] = 1 + min(l1, l2, l3)
        return memo[(s,t)]
    return ld(s, t)

EDIT: Followup question: Adding exceptions to Levenshtein-Distance-like algorithm

923

asked Sep 08 '15 11:09

Luis

2 Answers

You don't need the Levenshtein machinery for this.

import collections
def distance(s1, s2):
    cnt = collections.Counter()
    for c in s1:
        cnt[c] += 1
    for c in s2:
        cnt[c] -= 1
    return sum(abs(diff) for diff in cnt.values()) // 2 + \
        (abs(sum(cnt.values())) + 1) // 2   # can be omitted if len(s1) == len(s2)

190

answered Oct 11 '22 04:10

David Eisenstat

Why not just count how many letters are in common, and find and answer from this? For each character calculate its frequency, then for each string calculate how many "extra" characters it has based on frequencies, and take maximum of these "extra".

Pseudocode:

for c in s1:
    cnt1[c]++
for c in s2:
    cnt2[c]++
extra1 = 0
extra2 = 0
for c in all_chars:
    if cnt1[c]>cnt2[c]
        extra1 += cnt1[c]-cnt2[c]
    else
        extra2 += cnt2[c]-cnt1[c]
return max(extra1, extra2)

answered Oct 11 '22 04:10

Petr

Related questions
                            
                                Python multiprocessing and an imported module
                            
                                converting a string to a tree structure in python
                            
                                How to add for each screen an own .py and .kv file?
                            
                                Firefox not receiving django csrf_token
                            
                                How to filter DeprecationWarnings that happen during importing?
                            
                                Which layout should I use to get non-overlapping edges in igraph in python?
                            
                                numpy array multiplication with arrays of arbitrary dimensions
                            
                                Sklearn joblib load function IO error from AWS S3
                            
                                Normalizing a list of restaurant dishes
                            
                                Is the char encoding same across programming languages?
                            
                                Check specific file has been modified using python watchdog
                            
                                Bokeh: pass vars to CustomJS for Widgets
                            
                                Generating random string of seedable data
                            
                                Pyspark module not found
                            
                                Python3 .title() of utf-8 strings
                            
                                Drawing sorting networks [closed]
                            
                                Python: Force pprint to display unicode strings as strings?
                            
                                Turn list of company names into tickers
                            
                                Minifying a Flask application when templates have inline JS?
                            
                                struct.error: required argument is not an integer

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With