I have a dictionary with the following structure:
{
1: {"names": ["name1_A", "name1_B", ...]},
2: {"names": ["name2_A", "name2_B", ...]},
...
}
where name1_A
and name1_B
are synonyms/aliases/different ways to write the same name, whose ID is 1. name2_A
and name2_B
are aliases of the same name, whose ID is 2, and so on.
I need to write a function which takes a user input and returns the ID of the name whose alias is most similar to the user input.
I know it's not very intuitive to understand what I mean, so here's an example. Let's say this is my dictionary:
{
1: {"names": ["James", "Jamie"]},
2: {"names": ["Karen", "Karyn"]}
}
The user types in the word Jimmy
. Since the closest match to Jimmy
from the dictionary is Jamie
, the function has to return the ID 1.
If the user types in the world Karena
, since the closest match is Karen
, the function has to return the ID 2.
I think the best way to get the closest math is to use difflib's get_close_matches()
. However, that function takes a list of possibilities as argument, and I cannot think of a way to correctly use it in my function. Any help would be appreciated.
If you're interested in 3rd party modules, there's a nice little module I like to use for this sort of thing called fuzzywuzzy
, for fuzzy string matching in Python. This module uses the Levenshtein Distance metric for computing distances between two strings. Here's an example of how you use it:
>>> from fuzzywuzzy import fuzz
>>> from functools import partial
>>> data_dict = {
... 1: {"names": ["James", "Jamie"]},
... 2: {"names": ["Karen", "Karyn"]}
... }
>>> input_str = 'Karena'
>>> f = partial(fuzz.partial_ratio, input_str)
>>> matches = { k : max(data_dict[k]['names'], key=f) for k in data_dict}
>>> matches
{1: 'James', 2: 'Karen'}
>>> { i : (matches[i], f(matches[i])) for i in matches }
{1: ('James', 40), 2: ('Karen', 100)}
Now, you can extract Karen
since it has the highest score.
I've had to call the function twice for the purpose of this demo, but you should be able to do that just once depending on how you expand upon this example.
Another thing to note is that fuzz.partial_ratio
is more lenient with its matches. For a stricter matching scheme, consider using fuzz.ratio
.
You can peruse more examples using fuzzy string matching here.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With