I have a data frame similar like this:
pd.DataFrame([['a','b'],
['c','a'],
['c','d'],
['a','e'],
['p','g'],
['d','a'],
['c', 'g']
], columns=['col1','col2'])
I need to delete rows after an element appeared a certain number of times. For example, say I want to keep each value appear maximum of 2 times in this dataframe (in both columns), the final dataframe can be like this:
[['a','b'],
['a','c'],
['c','d'],
['p','g']
]
The order of rows to delete doesn't matter here. I want to maintain the maximum times of a value appear in my dataframe.
Many Thanks!
IIUC, try:
n=2
s=df.stack()
s[(s.groupby(s).cumcount()+1).le(n)].unstack().dropna()
col1 col2
0 a b
1 a c
2 c d
4 p g
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With