I have an example DataFrame like the following:
import pandas as pd
import numpy as np
df = pd.DataFrame({'ID':[1,2,2,2,3,3,], 'date':array(['2000-01-01','2002-01-01','2010-01-01','2003-01-01','2004-01-01','2008-01-01'],dtype='datetime64[D]')})
I am trying to get the 2nd earliest day in each ID group. So I wrote the following funciton:
def f(x):
if len(x)==1:
return x[0]
else:
x.sort()
return x[1]
And then I wrote:
df.groupby('ID').date.apply(lambda x:f(x))
The result is an error.
Could you find a way to make this work?
Step 1: split the data into groups by creating a groupby object from the original DataFrame; Step 2: apply a function, in this case, an aggregation function that computes a summary statistic (you can also transform or filter your data in this step); Step 3: combine the results into a new DataFrame.
Pandas DataFrame min() MethodThe min() method returns a Series with the minimum value of each column. By specifying the column axis ( axis='columns' ), the max() method searches column-wise and returns the minimum value for each row.
Groupby preserves the order of rows within each group.
This requires 0.14.1. And will be quite efficient, especially if you have large groups (as this doesn't require fully sorting them).
In [32]: df.groupby('ID')['date'].nsmallest(2)
Out[32]:
ID
1 0 2000-01-01
2 1 2002-01-01
3 2003-01-01
3 4 2004-01-01
5 2008-01-01
dtype: datetime64[ns]
In [33]: df.groupby('ID')['date'].nsmallest(2).groupby(level='ID').last()
Out[33]:
ID
1 2000-01-01
2 2003-01-01
3 2008-01-01
dtype: datetime64[ns]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With