Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Keep rows of a pandas dataframe based on both row and column conditions

Hello I have a pandas dataframe that I want to clean.Here is an example:

IDBILL IDBUYER BILL DATE
001 768787 45 1897-07-24
001 768787 67 1897-07-24
001 768787 98 1897-07-24
002 768787 30 1897-07-24
002 768787 15 1897-07-24
002 768787 12 1897-07-24
005 786545 45 1897-08-19
008 657676 89 1989-09-23
009 657676 42 1989-09-23
010 657676 18 1989-09-23
012 657676 51 1990-03-10
016 892354 73 1990-03-10
018 892354 48 1765-02-14
020 892354 62 1765-02-14

I want to delete the highest bills(and keep the lowest when the bills are made on the same day, by the same IDBUYER, and whose bills IDs follow each other. To get this:

IDBILL IDBUYER BILL DATE
002 768787 30 1897-07-24
002 768787 15 1897-07-24
002 768787 12 1897-07-24
005 786545 45 1897-08-19
010 657676 18 1989-09-23
012 657676 51 1990-03-10
016 892354 73 1990-03-10
018 892354 48 1765-02-14
020 892354 62 1765-02-14

Thank you in advance

like image 214
ascha Avatar asked Jan 22 '26 17:01

ascha


1 Answers

One solution:

df = df.sort_values('BILL')
df.loc[df.assign(cc = df.groupby(['DATE','IDBUYER',df.groupby(['DATE','IDBUYER'])['IDBILL'].transform(lambda x: x.diff().gt(1).cumsum())]).cumcount(),cc2 = df.groupby(['DATE','IDBUYER','IDBILL']).transform('count'),floor = lambda x: ~(x['cc'].floordiv(x['cc2'],axis=0).astype(bool)))['floor']].sort_index()

or:

(df.loc[df.groupby(['DATE',df['IDBILL'].diff().gt(1).cumsum()])['BILL']
.rank('dense').eq(1)
.groupby([df['IDBILL'],df['IDBUYER']]).transform('any')])
like image 199
rhug123 Avatar answered Jan 25 '26 06:01

rhug123



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!