Get close matches for multiple words in a dictionary

Question

I have a dictionary with the following structure:

{
    1: {"names": ["name1_A", "name1_B", ...]},
    2: {"names": ["name2_A", "name2_B", ...]},
    ...
}

where name1_A and name1_B are synonyms/aliases/different ways to write the same name, whose ID is 1. name2_A and name2_B are aliases of the same name, whose ID is 2, and so on.

I need to write a function which takes a user input and returns the ID of the name whose alias is most similar to the user input.

I know it's not very intuitive to understand what I mean, so here's an example. Let's say this is my dictionary:

{
    1: {"names": ["James", "Jamie"]},
    2: {"names": ["Karen", "Karyn"]}
}

The user types in the word Jimmy. Since the closest match to Jimmy from the dictionary is Jamie, the function has to return the ID 1.

If the user types in the world Karena, since the closest match is Karen, the function has to return the ID 2.

I think the best way to get the closest math is to use difflib's get_close_matches(). However, that function takes a list of possibilities as argument, and I cannot think of a way to correctly use it in my function. Any help would be appreciated.

cs95 · Accepted Answer

If you're interested in 3rd party modules, there's a nice little module I like to use for this sort of thing called fuzzywuzzy, for fuzzy string matching in Python. This module uses the Levenshtein Distance metric for computing distances between two strings. Here's an example of how you use it:

>>> from fuzzywuzzy import fuzz
>>> from functools import partial
>>> data_dict = {
...     1: {"names": ["James", "Jamie"]},
...     2: {"names": ["Karen", "Karyn"]}
... }
>>> input_str = 'Karena'
>>> f = partial(fuzz.partial_ratio, input_str)
>>> matches = { k : max(data_dict[k]['names'], key=f) for k in data_dict}
>>> matches
{1: 'James', 2: 'Karen'}
>>> { i : (matches[i], f(matches[i])) for i in matches }
{1: ('James', 40), 2: ('Karen', 100)}

Now, you can extract Karen since it has the highest score.

I've had to call the function twice for the purpose of this demo, but you should be able to do that just once depending on how you expand upon this example.

Another thing to note is that fuzz.partial_ratio is more lenient with its matches. For a stricter matching scheme, consider using fuzz.ratio.

You can peruse more examples using fuzzy string matching here.

Get close matches for multiple words in a dictionary

Tags:

python

string

dictionary

fuzzy-search

user2747949

1 Answers

cs95

Recent Activity

Donate For Us

Get close matches for multiple words in a dictionary

Tags:

python

string

dictionary

fuzzy-search

user2747949

1 Answers

cs95

Related questions

Recent Activity

Donate For Us