I have a dataframe <code>df</code> <pre class="prettyprint"><code>df = pd.DataFrame(np.arange(20).reshape(10, -1), [['a', 'a', 'a', 'a', 'b', 'b', 'b', 'c', 'c', 'd'], ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']], ['X', 'Y']) </code></pre> How do I get the first and last rows, grouped by the first level of the index? I tried <pre class="prettyprint"><code>df.groupby(level=0).agg(['first', 'last']).stack() </code></pre> and got <pre class="prettyprint"><code> X Y a first 0 1 last 6 7 b first 8 9 last 12 13 c first 14 15 last 16 17 d first 18 19 last 18 19 </code></pre> This is so close to what I want. How can I preserve the level 1 index and get this instead: <pre class="prettyprint"><code> X Y a a 0 1 d 6 7 b e 8 9 g 12 13 c h 14 15 i 16 17 d j 18 19 j 18 19 </code></pre>

This could be on of the easy solution. <pre class="prettyprint"><code>df.groupby(level = 0, as_index= False).nth([0,-1]) X Y a a 0 1 d 6 7 b e 8 9 g 12 13 c h 14 15 i 16 17 d j 18 19 </code></pre> Hope this helps. (Y)

Please try this: For last value: <code>df.groupby('Column_name').nth(-1)</code>, For first value: <code>df.groupby('Column_name').nth(0)</code>

get first and last values in a groupby

Tags:

python

pandas

dataframe

group-by

pandas-groupby

I have a dataframe df

df = pd.DataFrame(np.arange(20).reshape(10, -1),
                  [['a', 'a', 'a', 'a', 'b', 'b', 'b', 'c', 'c', 'd'],
                   ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']],
                  ['X', 'Y'])

How do I get the first and last rows, grouped by the first level of the index?

I tried

df.groupby(level=0).agg(['first', 'last']).stack()

and got

          X   Y
a first   0   1
  last    6   7
b first   8   9
  last   12  13
c first  14  15
  last   16  17
d first  18  19
  last   18  19

This is so close to what I want. How can I preserve the level 1 index and get this instead:

      X   Y
a a   0   1
  d   6   7
b e   8   9
  g  12  13
c h  14  15
  i  16  17
d j  18  19
  j  18  19

596

asked Aug 05 '16 20:08

Brian

3 Answers

Option 1

def first_last(df):     return df.ix[[0, -1]]  df.groupby(level=0, group_keys=False).apply(first_last)

enter image description here

Option 2 - only works if index is unique

idx = df.index.to_series().groupby(level=0).agg(['first', 'last']).stack() df.loc[idx]

Option 3 - per notes below, this only makes sense when there are no NAs

I also abused the agg function. The code below works, but is far uglier.

df.reset_index(1).groupby(level=0).agg(['first', 'last']).stack() \     .set_index('level_1', append=True).reset_index(1, drop=True) \     .rename_axis([None, None])

Note

per @unutbu: agg(['first', 'last']) take the firs non-na values.

I interpreted this as, it must then be necessary to run this column by column. Further, forcing index level=1 to align may not even make sense.

Let's include another test

df = pd.DataFrame(np.arange(20).reshape(10, -1),                   [list('aaaabbbccd'),                    list('abcdefghij')],                   list('XY'))  df.loc[tuple('aa'), 'X'] = np.nan

def first_last(df):     return df.ix[[0, -1]]  df.groupby(level=0, group_keys=False).apply(first_last)

enter image description here

df.reset_index(1).groupby(level=0).agg(['first', 'last']).stack() \     .set_index('level_1', append=True).reset_index(1, drop=True) \     .rename_axis([None, None])

enter image description here

Sure enough! This second solution is taking the first valid value in column X. It is now nonsensical to have forced that value to align with the index a.

105

answered Oct 05 '22 23:10

piRSquared

This could be on of the easy solution.

df.groupby(level = 0, as_index= False).nth([0,-1])        X   Y a a   0   1   d   6   7 b e   8   9   g  12  13 c h  14  15   i  16  17 d j  18  19

Hope this helps. (Y)

answered Oct 05 '22 21:10

Akarsh Jain

Please try this:

For last value: df.groupby('Column_name').nth(-1),

For first value: df.groupby('Column_name').nth(0)

answered Oct 05 '22 23:10

nat23dip

Related questions
                            
                                Redis in python, how do you close the connection?
                            
                                How to make Python format floats with certain amount of significant digits?
                            
                                Single line for-loop to build a dictionary?
                            
                                How to convert datatype:object to float64 in python?
                            
                                How do I plot two countplot graphs side by side in seaborn?
                            
                                How to execute a GROUP BY ... COUNT or SUM in Django ORM?
                            
                                Select as in sqlalchemy
                            
                                Escaping quotes in string
                            
                                Django ModelForm has no model class specified
                            
                                sum each value in a list of tuples
                            
                                Python: How to check if keys exists and retrieve value from Dictionary in descending priority
                            
                                How do I compile Python 3.4 with custom OpenSSL?
                            
                                Git 2.5.1's bash console doesn't open python interpreter
                            
                                How to supply stdin, files and environment variable inputs to Python unit tests?
                            
                                raw_input without pressing enter
                            
                                Django Blob Model Field
                            
                                No module named Image
                            
                                set object is not JSON serializable [duplicate]
                            
                                Python Pandas Data Frame save as HTML page
                            
                                pytest run only the changed file?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

get first and last values in a groupby

Tags:

python

pandas

dataframe

group-by

pandas-groupby

Brian

People also ask

3 Answers

Option 1

Option 2 - only works if index is unique

Option 3 - per notes below, this only makes sense when there are no NAs

Note

piRSquared

Akarsh Jain

nat23dip

Recent Activity

Donate For Us