I try to compare each row with all rows in a pandas dataframe with fuzzywuzzy.fuzzy.partial_ratio() >= 85
and write the results in a list for each row.
Example:
df = pd.DataFrame({'id': [1, 2, 3, 4, 5, 6], 'name': ['dog', 'cat', 'mad cat', 'good dog', 'bad dog', 'chicken']})
I want to use a pandas function with the fuzzywuzzy
library to get the result:
id name match_id_list
1 dog [4, 5]
2 cat [3, ]
3 mad cat [2, ]
4 good dog [1, 5]
5 bad dog [1, 4]
6 chicken []
But I don't understand how to get this.
Solution #1: In order to iterate over the rows of the Pandas dataframe we can use DataFrame. iterrows() function and then we can append the data of each row to the end of the list.
isin() to Select Rows From List of Values. DataFrame. isin() method is used to filter/select rows from a list of values. You can have the list of values in variable and use it on isin() or use it directly.
The first step would be to find the indices that match the condition for a given name
. Since partial_ratio
only takes strings, we apply
it to the dataframe:
name = 'dog'
df.apply(lambda row: (partial_ratio(row['name'], name) >= 85), axis=1)
We can then use enumerate
and list comprehension to generate the list of true
indices in the boolean array:
matches = df.apply(lambda row: (partial_ratio(row['name'], name) >= 85), axis=1)
[i for i, x in enumerate(matches) if x]
Let's put all this inside a function:
def func(name):
matches = df.apply(lambda row: (partial_ratio(row['name'], name) >= 85), axis=1)
return [i for i, x in enumerate(matches) if x]
We can now apply the function to the entire dataframe:
df.apply(lambda row: func(row['name']), axis=1)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With