I try to compare each row with all rows in a pandas dataframe with <code>fuzzywuzzy.fuzzy.partial_ratio() >= 85</code> and write the results in a list for each row. Example: <pre class="prettyprint"><code>df = pd.DataFrame({'id': [1, 2, 3, 4, 5, 6], 'name': ['dog', 'cat', 'mad cat', 'good dog', 'bad dog', 'chicken']}) </code></pre> I want to use a pandas function with the <code>fuzzywuzzy</code> library to get the result: <pre class="prettyprint"><code>id name match_id_list 1 dog [4, 5] 2 cat [3, ] 3 mad cat [2, ] 4 good dog [1, 5] 5 bad dog [1, 4] 6 chicken [] </code></pre> But I don't understand how to get this.

The first step would be to find the indices that match the condition for a given <code>name</code>. Since <code>partial_ratio</code> only takes strings, we <code>apply</code> it to the dataframe: <pre class="prettyprint"><code>name = 'dog' df.apply(lambda row: (partial_ratio(row['name'], name) >= 85), axis=1) </code></pre> We can then use <code>enumerate</code> and list comprehension to generate the list of <code>true</code> indices in the boolean array: <pre class="prettyprint"><code>matches = df.apply(lambda row: (partial_ratio(row['name'], name) >= 85), axis=1) [i for i, x in enumerate(matches) if x] </code></pre> Let's put all this inside a function: <pre class="prettyprint"><code>def func(name): matches = df.apply(lambda row: (partial_ratio(row['name'], name) >= 85), axis=1) return [i for i, x in enumerate(matches) if x] </code></pre> We can now apply the function to the entire dataframe: <pre class="prettyprint"><code>df.apply(lambda row: func(row['name']), axis=1) </code></pre>

Compare each row with all rows in data frame and save results in list for each row

Tags:

python

pandas

data-analysis

fuzzywuzzy

I try to compare each row with all rows in a pandas dataframe with fuzzywuzzy.fuzzy.partial_ratio() >= 85 and write the results in a list for each row.

Example:

df = pd.DataFrame({'id': [1, 2, 3, 4, 5, 6], 'name': ['dog', 'cat', 'mad cat', 'good dog', 'bad dog', 'chicken']})

I want to use a pandas function with the fuzzywuzzy library to get the result:

id  name     match_id_list
1   dog      [4, 5]
2   cat      [3, ]
3   mad cat  [2, ]
4   good dog [1, 5]
5   bad dog  [1, 4]
6   chicken  []

But I don't understand how to get this.

222

asked Feb 17 '16 14:02

pirr

1 Answers

The first step would be to find the indices that match the condition for a given name. Since partial_ratio only takes strings, we apply it to the dataframe:

name = 'dog'
df.apply(lambda row: (partial_ratio(row['name'], name) >= 85), axis=1)

We can then use enumerate and list comprehension to generate the list of true indices in the boolean array:

matches = df.apply(lambda row: (partial_ratio(row['name'], name) >= 85), axis=1)
[i for i, x in enumerate(matches) if x]

Let's put all this inside a function:

def func(name):
    matches = df.apply(lambda row: (partial_ratio(row['name'], name) >= 85), axis=1)
    return [i for i, x in enumerate(matches) if x]

We can now apply the function to the entire dataframe:

df.apply(lambda row: func(row['name']), axis=1)

140

answered Oct 12 '22 17:10

IanS

Related questions
                            
                                Returning AttributeError: 'int' object has no attribute 'encode'
                            
                                Add numbers and exit with a sentinel
                            
                                Kivy run function from kv button
                            
                                Finding groups of increasing numbers in a list
                            
                                Django change database field from integer to CharField
                            
                                Why are some items not translated in Odoo?
                            
                                knnMatch does not work with K != 1
                            
                                How to clone an scikit-learn estimator including its data?
                            
                                Python cannot install PyGObject
                            
                                Python 2.7 Openpyxl UserWarning
                            
                                How to Exit Linux terminal using Python script?
                            
                                global vs. local namespace performance difference
                            
                                fit method in python sklearn
                            
                                Django on IIS: django.core.exceptions.AppRegistryNotReady: Apps aren't loaded yet
                            
                                Combine two JSON dictionaries in Python?
                            
                                numpy ValueError shapes not aligned
                            
                                How do I check if the list contains empty elements?
                            
                                MatPlotLib's ion() and draw() not working
                            
                                How to make django crispy form to hide a particular field?
                            
                                How to get status code when using after_request?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With