I have dataframe as below:
df = pd.DataFrame({'User':['a','a','a','b','b','b'],
'Type':['101','102','101','101','101','102'],
'Qty':[10, -10, 10, 30, 5, -5]})
I want to remove pair value of df['Type'] = 101 and 102 where df['Qty'] net off each other. The end result would be as such:
df = pd.DataFrame({'User':['a','b'],
'Type':['101', '101'],
'Qty':[10, 30})
I tried to convert the negative value into absolute number and remove duplicates as such:
df['Qty'] = df['Qty'].abs()
df.drop_duplicates(subset=['Qty'], keep='first')
But then it wrongly give me such dataframe:
df = pd.DataFrame({'User':['a','b', 'b'],
'Type':['101', '101', '101'],
'Qty':[10, 30, 5})
Idea is create combinations of index values per groups and test if each subgroup contains both Type
s and sum is 0
for set ot this matched pairs:
#solution need unique index values
df = df.reset_index(drop=True)
from itertools import combinations
out = set()
def f(x):
for i in combinations(x.index, 2):
a = x.loc[list(i)]
if (set(a['Type']) == set(['101','102'])) and (a['Qty'].sum() == 0):
out.add(i)
df.groupby('User').apply(f)
print (out)
{(0, 1), (4, 5), (1, 2)}
Then remove all pairs if duplicated some value, like here (1,2)
:
s = pd.Series(list(out)).explode()
idx = s.index[s.duplicated()]
final = s.drop(idx)
print (final)
0 0
0 1
1 4
1 5
dtype: object
And last remove rows from original:
df = df.drop(final)
print (df)
User Type Qty
2 a 101 10
3 b 101 30
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With