Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python pandas: select 2nd smallest value in groupby

Tags:

python

pandas

I have an example DataFrame like the following:

import pandas as pd
import numpy as np
df = pd.DataFrame({'ID':[1,2,2,2,3,3,], 'date':array(['2000-01-01','2002-01-01','2010-01-01','2003-01-01','2004-01-01','2008-01-01'],dtype='datetime64[D]')})

I am trying to get the 2nd earliest day in each ID group. So I wrote the following funciton:

def f(x):
    if len(x)==1:
        return x[0]
    else:
        x.sort()
        return x[1]

And then I wrote:

df.groupby('ID').date.apply(lambda x:f(x))

The result is an error.

Could you find a way to make this work?

like image 835
midtownguru Avatar asked Jul 24 '14 21:07

midtownguru


People also ask

How do you split a Groupby in Pandas?

Step 1: split the data into groups by creating a groupby object from the original DataFrame; Step 2: apply a function, in this case, an aggregation function that computes a summary statistic (you can also transform or filter your data in this step); Step 3: combine the results into a new DataFrame.

How do you get the lowest value in Pandas?

Pandas DataFrame min() MethodThe min() method returns a Series with the minimum value of each column. By specifying the column axis ( axis='columns' ), the max() method searches column-wise and returns the minimum value for each row.

Does Pandas Groupby maintain order?

Groupby preserves the order of rows within each group.


1 Answers

This requires 0.14.1. And will be quite efficient, especially if you have large groups (as this doesn't require fully sorting them).

In [32]: df.groupby('ID')['date'].nsmallest(2)
Out[32]: 
ID   
1   0   2000-01-01
2   1   2002-01-01
    3   2003-01-01
3   4   2004-01-01
    5   2008-01-01
dtype: datetime64[ns]

In [33]: df.groupby('ID')['date'].nsmallest(2).groupby(level='ID').last()
Out[33]: 
ID
1    2000-01-01
2    2003-01-01
3    2008-01-01
dtype: datetime64[ns]
like image 119
Jeff Avatar answered Nov 08 '22 21:11

Jeff