Edit Distance in Python

Tags:

I'm programming a spellcheck program in Python. I have a list of valid words (the dictionary) and I need to output a list of words from this dictionary that have an edit distance of 2 from a given invalid word.

I know I need to start by generating a list with an edit distance of one from the invalid word(and then run that again on all the generated words). I have three methods, inserts(...), deletions(...) and changes(...) that should output a list of words with an edit distance of 1, where inserts outputs all valid words with one more letter than the given word, deletions outputs all valid words with one less letter, and changes outputs all valid words with one different letter.

I've checked a bunch of places but I can't seem to find an algorithm that describes this process. All the ideas I've come up with involve looping through the dictionary list multiple times, which would be extremely time consuming. If anyone could offer some insight, I'd be extremely grateful.

887

asked Mar 17 '10 06:03

Mel

2 Answers

The thing you are looking at is called an edit distance and here is a nice explanation on wiki. There are a lot of ways how to define a distance between the two words and the one that you want is called Levenshtein distance and here is a DP (dynamic programming) implementation in python.

def levenshteinDistance(s1, s2):     if len(s1) > len(s2):         s1, s2 = s2, s1      distances = range(len(s1) + 1)     for i2, c2 in enumerate(s2):         distances_ = [i2+1]         for i1, c1 in enumerate(s1):             if c1 == c2:                 distances_.append(distances[i1])             else:                 distances_.append(1 + min((distances[i1], distances[i1 + 1], distances_[-1])))         distances = distances_     return distances[-1]

And a couple of more implementations are here.

answered Sep 18 '22 06:09

Salvador Dali

difflib in the standard library has various utilities for sequence matching, including the get_close_matches method that you could use. It uses an algorithm adapted from Ratcliff and Obershelp.

From the docs

>>> from difflib import get_close_matches >>> get_close_matches('appel', ['ape', 'apple', 'peach', 'puppy']) ['apple', 'ape']

answered Sep 21 '22 06:09

ryanjdillon

Related questions
                            
                                How to apply itertools.product to elements of a list of lists?
                            
                                openpyxl get sheet by name
                            
                                Plot CDF + cumulative histogram using Seaborn Python
                            
                                How to prevent overlapping x-axis labels in sns.countplot
                            
                                flask blueprint template folder
                            
                                Automatically play sound in IPython notebook
                            
                                install_requires based on python version
                            
                                ImportError: No module named enum
                            
                                Plotting multiple different plots in one figure using Seaborn
                            
                                How do I convert a currency string to a floating point number in Python?
                            
                                Sending a form array to Flask
                            
                                Fastest save and load options for a numpy array
                            
                                Parsing HTTP User-Agent string
                            
                                Why does json serialization of datetime objects in python not work out of the box for datetime objects
                            
                                Compiling numpy with OpenBLAS integration
                            
                                What is the fastest template system for Python?
                            
                                How to launch an EDITOR (e. g. vim) from a python script?
                            
                                TypeError: 'int' object does not support indexing
                            
                                View RDD contents in Python Spark?
                            
                                What is a tuple useful for?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Edit Distance in Python

Tags:

python

algorithm

edit

distance

Mel

People also ask

2 Answers

Salvador Dali

ryanjdillon

Recent Activity

Donate For Us