Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Delete elements after it appeared a certain times

I have a data frame similar like this:

pd.DataFrame([['a','b'], 
          ['c','a'],
          ['c','d'],
          ['a','e'],
          ['p','g'],
          ['d','a'],
          ['c', 'g']
         ], columns=['col1','col2'])

I need to delete rows after an element appeared a certain number of times. For example, say I want to keep each value appear maximum of 2 times in this dataframe (in both columns), the final dataframe can be like this:

[['a','b'], 
 ['a','c'],
 ['c','d'],
 ['p','g']
]

The order of rows to delete doesn't matter here. I want to maintain the maximum times of a value appear in my dataframe.

Many Thanks!

like image 962
Sapling Avatar asked Dec 18 '22 15:12

Sapling


1 Answers

IIUC, try:

n=2
s=df.stack()
s[(s.groupby(s).cumcount()+1).le(n)].unstack().dropna()

  col1 col2
0    a    b
1    a    c
2    c    d
4    p    g
like image 184
anky Avatar answered Dec 26 '22 13:12

anky