Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas - find rows with matching values in two columns and multiply value in another column

First suppose we have a dataframe below:

import pandas as pd
data = pd.DataFrame({'id':['1','2','3','4','5','6','7','8'], 
                     'A':['foo', 'bar', 'foo', 'bar','foo', 'bar', 'foo', 'foo'],  
                     'C':['10','10','10','30','50','60','50','8'], 
                     'D':['9','8','7','6','5','4','3','2']})
print(data)

    A   C   D   id
0   foo 10  9   1
1   bar 10  8   2
2   foo 10  7   3
3   bar 30  6   4
4   foo 50  5   5
5   bar 60  4   6
6   foo 50  3   7
7   foo 8   2   8

What I would like to do is find match rows and then do some calculation.

for any two ids(idx, idy) in data.iterrows():
       if idx.A == idy.A and idx.C = idy.C:
       result = idx.D * idy.D

and then generate a new dataframe with three columns ['id'], ['A'] and ['result'].

So a few rows of expected result is:

     id   A   result   
0    1   foo   63   
1    3   foo   63   
2    5   foo   15
3    7   foo   15

I have tried but the results are either wrong logic or wrong code/data format. Can someone give me a hand please?

like image 207
Alex12346 Avatar asked Dec 17 '25 14:12

Alex12346


2 Answers

One way is to groupby A + C, take the product and count, filter out those that only have a single item in the group, then inner merge back on A + C to your original frame, eg:

df.merge(
    df.groupby(['A', 'C']).D.agg(['prod', 'count'])
    [lambda r: r['count'] > 1],
    left_on=['A', 'C'],
    right_index=True
)

Gives you:

     A   C  D  id  prod  count
0  foo  10  9   1    63      2
2  foo  10  7   3    63      2
4  foo  50  5   5    15      2
6  foo  50  3   7    15      2

Then drop/rename columns as appropriate.

like image 185
Jon Clements Avatar answered Dec 20 '25 16:12

Jon Clements


You can use self-join technique:

data[['id', 'C', 'D']] = data[['id', 'C', 'D']].apply(pd.to_numeric)
joint = pd.merge(data, data, on=('A', 'C'))
joint = joint.loc[join['id_x'] != join['id_y']]
joint['result'] = joint['D_x'] * joint['D_y']
result = joint[['id_x', 'A', 'result']]
result.columns = ['id', 'A', 'result']

Result:

   id    A  result
1   1  foo      63
2   3  foo      63
7   5  foo      15
8   7  foo      15
like image 27
Lev Zakharov Avatar answered Dec 20 '25 18:12

Lev Zakharov



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!