Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

get first and last values in a groupby

I have a dataframe df

df = pd.DataFrame(np.arange(20).reshape(10, -1),
                  [['a', 'a', 'a', 'a', 'b', 'b', 'b', 'c', 'c', 'd'],
                   ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']],
                  ['X', 'Y'])

How do I get the first and last rows, grouped by the first level of the index?

I tried

df.groupby(level=0).agg(['first', 'last']).stack()

and got

          X   Y
a first   0   1
  last    6   7
b first   8   9
  last   12  13
c first  14  15
  last   16  17
d first  18  19
  last   18  19

This is so close to what I want. How can I preserve the level 1 index and get this instead:

      X   Y
a a   0   1
  d   6   7
b e   8   9
  g  12  13
c h  14  15
  i  16  17
d j  18  19
  j  18  19
like image 596
Brian Avatar asked Aug 05 '16 20:08

Brian


People also ask

How do you find the first value of Groupby?

groupby. nth() function is used to get the value corresponding the nth row for each group. To get the first value in a group, pass 0 as an argument to the nth() function.

How do you sort values after Groupby?

Sort Values in Descending Order with Groupby You can sort values in descending order by using ascending=False param to sort_values() method. The head() function is used to get the first n rows. It is useful for quickly testing if your object has the right type of data in it.

What is AGG in Groupby?

agg is an alias for aggregate . Use the alias. A passed user-defined-function will be passed a Series for evaluation. The aggregation is for each column.

What is possible using Groupby () method of pandas?

groupby() function is used to split the data into groups based on some criteria. pandas objects can be split on any of their axes. The abstract definition of grouping is to provide a mapping of labels to group names. sort : Sort group keys.


3 Answers

Option 1

def first_last(df):     return df.ix[[0, -1]]  df.groupby(level=0, group_keys=False).apply(first_last) 

enter image description here


Option 2 - only works if index is unique

idx = df.index.to_series().groupby(level=0).agg(['first', 'last']).stack() df.loc[idx] 

Option 3 - per notes below, this only makes sense when there are no NAs

I also abused the agg function. The code below works, but is far uglier.

df.reset_index(1).groupby(level=0).agg(['first', 'last']).stack() \     .set_index('level_1', append=True).reset_index(1, drop=True) \     .rename_axis([None, None]) 

Note

per @unutbu: agg(['first', 'last']) take the firs non-na values.

I interpreted this as, it must then be necessary to run this column by column. Further, forcing index level=1 to align may not even make sense.

Let's include another test

df = pd.DataFrame(np.arange(20).reshape(10, -1),                   [list('aaaabbbccd'),                    list('abcdefghij')],                   list('XY'))  df.loc[tuple('aa'), 'X'] = np.nan 

def first_last(df):     return df.ix[[0, -1]]  df.groupby(level=0, group_keys=False).apply(first_last) 

enter image description here

df.reset_index(1).groupby(level=0).agg(['first', 'last']).stack() \     .set_index('level_1', append=True).reset_index(1, drop=True) \     .rename_axis([None, None]) 

enter image description here

Sure enough! This second solution is taking the first valid value in column X. It is now nonsensical to have forced that value to align with the index a.

like image 105
piRSquared Avatar answered Oct 05 '22 23:10

piRSquared


This could be on of the easy solution.

df.groupby(level = 0, as_index= False).nth([0,-1])        X   Y a a   0   1   d   6   7 b e   8   9   g  12  13 c h  14  15   i  16  17 d j  18  19 

Hope this helps. (Y)

like image 40
Akarsh Jain Avatar answered Oct 05 '22 21:10

Akarsh Jain


Please try this:

For last value: df.groupby('Column_name').nth(-1),

For first value: df.groupby('Column_name').nth(0)

like image 37
nat23dip Avatar answered Oct 05 '22 23:10

nat23dip