Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Groupby - taking last element - how do I keep nan's?

Tags:

python

pandas

I have a df and I want to grab the most recent row below by CUSIP.

In [374]: df.head()
Out[374]: 
              CUSIP        COLA         COLB       COLC  
date                                                          
1992-05-08    AAA          238         4256      3.523346   
1992-07-13    AAA          NaN         4677      3.485577   
1992-12-12    BBB          221         5150      3.24
1995-12-12    BBB          254         5150      3.25
1997-12-12    BBB          245         Nan       3.25
1998-12-12    CCC          234         5140      3.24145
1999-12-12    CCC          223         5120      3.65145

I am using:

df = df.reset_index().groupby('CUSIP').last().reset_index.set_index('date')

I want this:

              CUSIP        COLA         COLB       COLC  
date           
1992-07-13    AAA          NaN         4677      3.485577      
1997-12-12    BBB          245         Nan       3.25
1999-12-12    CCC          223         5120      3.65145

Instead I am getting:

              CUSIP        COLA         COLB       COLC  
date           
1992-07-13    AAA          238         4677      3.485577      
1997-12-12    BBB          245         5150       3.25
1999-12-12    CCC          223         5120      3.65145

How do I get last() to take the last row of the groupby including the NaN's?

Thank you.

like image 492
user1911092 Avatar asked Dec 17 '13 20:12

user1911092


People also ask

How do you get the last row of a GroupBy panda?

To get the last row of each group, call last() after grouping.

How to get last value in a group python?

How to get the last value in each group? You can use the pandas. groupby. last() function to get the last value in each group.

Does pandas GroupBy ignore Nan?

From the docs: "NA groups in GroupBy are automatically excluded".

What is a SeriesGroupBy object?

Grouping a Series by a Series Instead, it's a SeriesGroupBy object. A SeriesGroupBy consists of groups , one for each of the distinct values of the Party column. If we ask to see these groups, we'll be able to see which indices in the original DataFrame correspond to each group.


1 Answers

You can do this directly with an apply instead of last (and get the -1th row of each group):

In [11]: df.reset_index().groupby('CUSIP').apply(lambda x: x.iloc[-1]).reset_index(drop=True).set_index('date')
Out[11]: 
           CUSIP  COLA  COLB      COLC
date                                  
1992-07-13   AAA   NaN  4677  3.485577
1997-12-12   BBB   245   NaN  3.250000
1999-12-12   CCC   223  5120  3.651450

[3 rows x 4 columns]

In 0.13 (rc out now), a faster and more direct way will be to use cumcount:

In [12]: df[df.groupby('CUSIP').cumcount(ascending=False) == 0]
Out[12]: 
           CUSIP  COLA  COLB      COLC
date                                  
1992-07-13   AAA   NaN  4677  3.485577
1997-12-12   BBB   245   NaN  3.250000
1999-12-12   CCC   223  5120  3.651450

[3 rows x 4 columns]
like image 120
Andy Hayden Avatar answered Sep 30 '22 07:09

Andy Hayden