Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Deleting rows in pandas dataframe based on pair value

I have dataframe as below:

df = pd.DataFrame({'User':['a','a','a','b','b','b'],
                 'Type':['101','102','101','101','101','102'],
                 'Qty':[10, -10, 10, 30, 5, -5]})

I want to remove pair value of df['Type'] = 101 and 102 where df['Qty'] net off each other. The end result would be as such:

df = pd.DataFrame({'User':['a','b'],
                     'Type':['101', '101'],
                     'Qty':[10, 30})

I tried to convert the negative value into absolute number and remove duplicates as such:

df['Qty'] = df['Qty'].abs()
df.drop_duplicates(subset=['Qty'], keep='first')

But then it wrongly give me such dataframe:

df = pd.DataFrame({'User':['a','b', 'b'],
                     'Type':['101', '101', '101'],
                     'Qty':[10, 30, 5})
like image 887
rain123 Avatar asked Mar 02 '23 07:03

rain123


1 Answers

Idea is create combinations of index values per groups and test if each subgroup contains both Types and sum is 0 for set ot this matched pairs:

#solution need unique index values
df = df.reset_index(drop=True)

from  itertools import combinations
    
out = set()
def f(x):
    for i in combinations(x.index, 2):
        a = x.loc[list(i)]
        if (set(a['Type']) == set(['101','102'])) and (a['Qty'].sum() == 0):
           out.add(i)

df.groupby('User').apply(f)

print (out)
{(0, 1), (4, 5), (1, 2)}

Then remove all pairs if duplicated some value, like here (1,2):

s = pd.Series(list(out)).explode()
idx = s.index[s.duplicated()]
final = s.drop(idx)
print (final)
0    0
0    1
1    4
1    5
dtype: object

And last remove rows from original:

df = df.drop(final)
print (df)
  User Type  Qty
2    a  101   10
3    b  101   30
like image 133
jezrael Avatar answered Mar 16 '23 05:03

jezrael