Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Compare each row with all rows in data frame and save results in list for each row

I try to compare each row with all rows in a pandas dataframe with fuzzywuzzy.fuzzy.partial_ratio() >= 85 and write the results in a list for each row.

Example:

df = pd.DataFrame({'id': [1, 2, 3, 4, 5, 6], 'name': ['dog', 'cat', 'mad cat', 'good dog', 'bad dog', 'chicken']})

I want to use a pandas function with the fuzzywuzzy library to get the result:

id  name     match_id_list
1   dog      [4, 5]
2   cat      [3, ]
3   mad cat  [2, ]
4   good dog [1, 5]
5   bad dog  [1, 4]
6   chicken  []

But I don't understand how to get this.

like image 222
pirr Avatar asked Feb 17 '16 14:02

pirr


People also ask

How do you get each row of a DataFrame into a list?

Solution #1: In order to iterate over the rows of the Pandas dataframe we can use DataFrame. iterrows() function and then we can append the data of each row to the end of the list.

How do you select rows of pandas DataFrame based on values in a list?

isin() to Select Rows From List of Values. DataFrame. isin() method is used to filter/select rows from a list of values. You can have the list of values in variable and use it on isin() or use it directly.


1 Answers

The first step would be to find the indices that match the condition for a given name. Since partial_ratio only takes strings, we apply it to the dataframe:

name = 'dog'
df.apply(lambda row: (partial_ratio(row['name'], name) >= 85), axis=1)

We can then use enumerate and list comprehension to generate the list of true indices in the boolean array:

matches = df.apply(lambda row: (partial_ratio(row['name'], name) >= 85), axis=1)
[i for i, x in enumerate(matches) if x]

Let's put all this inside a function:

def func(name):
    matches = df.apply(lambda row: (partial_ratio(row['name'], name) >= 85), axis=1)
    return [i for i, x in enumerate(matches) if x]

We can now apply the function to the entire dataframe:

df.apply(lambda row: func(row['name']), axis=1)
like image 140
IanS Avatar answered Oct 12 '22 17:10

IanS