How to group values of pandas dataframe and select the latest(by date) from each group?
For example, given a dataframe sorted by date:
id product date 0 220 6647 2014-09-01 1 220 6647 2014-09-03 2 220 6647 2014-10-16 3 826 3380 2014-11-11 4 826 3380 2014-12-09 5 826 3380 2015-05-19 6 901 4555 2014-09-01 7 901 4555 2014-10-05 8 901 4555 2014-11-01
grouping by id or product, and selecting the earliest gives:
id product date 2 220 6647 2014-10-16 5 826 3380 2015-05-19 8 901 4555 2014-11-01
Use iloc[] to select last N columns of pandas dataframe. Use [] to select last N columns of pandas dataframe. Use tail() to select last N columns of pandas dataframe.
Method 1: Using tail() method DataFrame. tail(n) to get the last n rows of the DataFrame. It takes one optional argument n (number of rows you want to get from the end). By default n = 5, it return the last 5 rows if the value of n is not passed to the method.
Groupby preserves the order of rows within each group. When calling apply, add group keys to index to identify pieces. Reduce the dimensionality of the return type if possible, otherwise return a consistent type.
You can also use tail
with groupby to get the last n values of the group:
df.sort_values('date').groupby('id').tail(1) id product date 2 220 6647 2014-10-16 8 901 4555 2014-11-01 5 826 3380 2015-05-19
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With