Keep rows of a pandas dataframe based on both row and column conditions

Question

Hello I have a pandas dataframe that I want to clean.Here is an example:

IDBILL	IDBUYER	BILL	DATE
001	768787	45	1897-07-24
001	768787	67	1897-07-24
001	768787	98	1897-07-24
002	768787	30	1897-07-24
002	768787	15	1897-07-24
002	768787	12	1897-07-24
005	786545	45	1897-08-19
008	657676	89	1989-09-23
009	657676	42	1989-09-23
010	657676	18	1989-09-23
012	657676	51	1990-03-10
016	892354	73	1990-03-10
018	892354	48	1765-02-14
020	892354	62	1765-02-14

I want to delete the highest bills(and keep the lowest when the bills are made on the same day, by the same IDBUYER, and whose bills IDs follow each other. To get this:

IDBILL	IDBUYER	BILL	DATE
002	768787	30	1897-07-24
002	768787	15	1897-07-24
002	768787	12	1897-07-24
005	786545	45	1897-08-19
010	657676	18	1989-09-23
012	657676	51	1990-03-10
016	892354	73	1990-03-10
018	892354	48	1765-02-14
020	892354	62	1765-02-14

Thank you in advance

rhug123 · Accepted Answer

One solution:

df = df.sort_values('BILL')
df.loc[df.assign(cc = df.groupby(['DATE','IDBUYER',df.groupby(['DATE','IDBUYER'])['IDBILL'].transform(lambda x: x.diff().gt(1).cumsum())]).cumcount(),cc2 = df.groupby(['DATE','IDBUYER','IDBILL']).transform('count'),floor = lambda x: ~(x['cc'].floordiv(x['cc2'],axis=0).astype(bool)))['floor']].sort_index()

or:

(df.loc[df.groupby(['DATE',df['IDBILL'].diff().gt(1).cumsum()])['BILL']
.rank('dense').eq(1)
.groupby([df['IDBILL'],df['IDBUYER']]).transform('any')])

Keep rows of a pandas dataframe based on both row and column conditions

Tags:

python

pandas

dataframe

ascha

1 Answers

rhug123

Recent Activity

Donate For Us

Keep rows of a pandas dataframe based on both row and column conditions

Tags:

python

pandas

dataframe

ascha

1 Answers

rhug123

Related questions

Recent Activity

Donate For Us