Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

string comparison for multiple values python

I have sets of data. The first (A) is a list of equipment with sophisticated names. The second is a list of more broad equipment categories (B) - to which I have to group the first list into using string comparisons. I'm aware this won't be perfect.

For each entity in List A - I'd like to establish the levenshtein distance for each entity in List B. The record in List B with the highest score will be the group to which I'll assign that data point.

I'm very rusty in python - and am playing around with FuzzyWuzzy to get the distance between two string values. However - I can't quite figure out how to iterate through each list to produce what I need.

I presumed I'd just create a list for each data set and write a pretty basic loop for each - but like I said I'm a little rusty and not having any luck.

Any help would be greatly appreciated! If there is another package that will allow me to do this (not Fuzzy) - I'm glad to take suggestions.

like image 938
MacAnRiogh Avatar asked Oct 05 '17 01:10

MacAnRiogh


People also ask

How do I check if a string contains multiple values in Python?

You can use any : a_string = "A string is more than its parts!" matches = ["more", "wholesome", "milk"] if any(x in a_string for x in matches): Similarly to check if all the strings from the list are found, use all instead of any . any() takes an iterable.

How do you compare 4 variables in Python?

How do you compare 4 variables in Python? Use or or and to compare multiple variables to a value Place or between multiple boolean variables to check if any of the variables are true. Place and between multiple boolean variables to check if all of the variables are true.

Can you use == to compare strings in Python?

==: This operator checks whether two strings are equal. !=: This operator checks whether two strings are not equal. <: This operator checks whether the string on the left side is smaller than the string on the right side.

How do you find multiple substrings in a string in Python?

Use the all() function to check if multiple strings exist in another string, e.g. if all(substring in my_str for substring in list_of_strings): . The all() function will return True if all of the substrings exist in the string and False otherwise.


1 Answers

It looks like the process.extractOne function is what you're looking for. A simple use case is something like

from fuzzywuzzy import process
from collections import defaultdict

complicated_names = ['leather couch', 'left-handed screwdriver', 'tomato peeler']
generic_names = ['couch', 'screwdriver', 'peeler']

group = defaultdict(list)   

for name in complicated_names:
    group[process.extractOne(name, generic_names)[0]].append(name)

defaultdict is a dictionary that has default values for all keys.

We loop over all the complicated names, use fuzzywuzzy to find the closest match, and then add the name to the list associated with that match.

like image 184
Patrick Haugh Avatar answered Sep 18 '22 00:09

Patrick Haugh