i have a dataframe with following data :
invoice_no dealer billing_change_previous_month date
110 1 0 2016-12-31
100 1 -41981 2017-01-30
5505 2 0 2017-01-30
5635 2 58730 2016-12-31
i want to have only one dealer with the maximum date . The desired output should be like this :
invoice_no dealer billing_change_previous_month date
100 1 -41981 2017-01-30
5505 2 0 2017-01-30
each dealer should be distinct with maximum date, thanks in advance for your help.
groupby() can take the list of columns to group by multiple columns and use the aggregate functions to apply single or multiple aggregations at the same time.
Grouping by Multiple Columns You can do this by passing a list of column names to groupby instead of a single string value.
How to perform groupby index in pandas? Pass index name of the DataFrame as a parameter to groupby() function to group rows on an index. DataFrame. groupby() function takes string or list as a param to specify the group columns or index.
You can use boolean indexing using groupby and transform
df_new = df[df.groupby('dealer').date.transform('max') == df['date']]
invoice_no dealer billing_change_previous_month date
1 100 1 -41981 2017-01-30
2 5505 2 0 2017-01-30
The solution works as expected even if there are more than two dealers (to address question posted by Ben Smith),
df = pd.DataFrame({'invoice_no':[110,100,5505,5635,10000,10001], 'dealer':[1,1,2,2,3,3],'billing_change_previous_month':[0,-41981,0,58730,9000,100], 'date':['2016-12-31','2017-01-30','2017-01-30','2016-12-31', '2019-12-31', '2020-01-31']})
df['date'] = pd.to_datetime(df['date'])
df[df.groupby('dealer').date.transform('max') == df['date']]
invoice_no dealer billing_change_previous_month date
1 100 1 -41981 2017-01-30
2 5505 2 0 2017-01-30
5 10001 3 100 2020-01-31
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With