I have a dataframe which was created via a df.pivot:
type start end F_Type to_date A 20150908143000 345 316 B 20150908140300 NaN 480 20150908140600 NaN 120 20150908143000 10743 8803 C 20150908140100 NaN 1715 20150908140200 NaN 1062 20150908141000 NaN 145 20150908141500 418 NaN 20150908141800 NaN 450 20150908142900 1973 1499 20150908143000 19522 16659 D 20150908143000 433 65 E 20150908143000 7290 7375 F 20150908143000 0 0 G 20150908143000 1796 340
I would like to filter and return a single row for each 'F_TYPE' only returning the row with the Maximum 'to_date'. I would like to return the following dataframe:
type start end F_Type to_date A 20150908143000 345 316 B 20150908143000 10743 8803 C 20150908143000 19522 16659 D 20150908143000 433 65 E 20150908143000 7290 7375 F 20150908143000 0 0 G 20150908143000 1796 340
Thanks..
A standard approach is to use groupby(keys)[column].idxmax()
. However, to select the desired rows using idxmax
you need idxmax
to return unique index values. One way to obtain a unique index is to call reset_index
.
Once you obtain the index values from groupby(keys)[column].idxmax()
you can then select the entire row using df.loc
:
In [20]: df.loc[df.reset_index().groupby(['F_Type'])['to_date'].idxmax()] Out[20]: start end F_Type to_date A 20150908143000 345 316 B 20150908143000 10743 8803 C 20150908143000 19522 16659 D 20150908143000 433 65 E 20150908143000 7290 7375 F 20150908143000 0 0 G 20150908143000 1796 340
Note: idxmax
returns index labels, not necessarily ordinals. After using reset_index
the index labels happen to also be ordinals, but since idxmax
is returning labels (not ordinals) it is better to always use idxmax
in conjunction with df.loc
, not df.iloc
(as I originally did in this post.)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With