I have a dataframe df
df = pd.DataFrame(np.arange(20).reshape(10, -1),
[['a', 'a', 'a', 'a', 'b', 'b', 'b', 'c', 'c', 'd'],
['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']],
['X', 'Y'])
How do I get the first and last rows, grouped by the first level of the index?
I tried
df.groupby(level=0).agg(['first', 'last']).stack()
and got
X Y
a first 0 1
last 6 7
b first 8 9
last 12 13
c first 14 15
last 16 17
d first 18 19
last 18 19
This is so close to what I want. How can I preserve the level 1 index and get this instead:
X Y
a a 0 1
d 6 7
b e 8 9
g 12 13
c h 14 15
i 16 17
d j 18 19
j 18 19
groupby. nth() function is used to get the value corresponding the nth row for each group. To get the first value in a group, pass 0 as an argument to the nth() function.
Sort Values in Descending Order with Groupby You can sort values in descending order by using ascending=False param to sort_values() method. The head() function is used to get the first n rows. It is useful for quickly testing if your object has the right type of data in it.
agg is an alias for aggregate . Use the alias. A passed user-defined-function will be passed a Series for evaluation. The aggregation is for each column.
groupby() function is used to split the data into groups based on some criteria. pandas objects can be split on any of their axes. The abstract definition of grouping is to provide a mapping of labels to group names. sort : Sort group keys.
def first_last(df): return df.ix[[0, -1]] df.groupby(level=0, group_keys=False).apply(first_last)
idx = df.index.to_series().groupby(level=0).agg(['first', 'last']).stack() df.loc[idx]
I also abused the agg
function. The code below works, but is far uglier.
df.reset_index(1).groupby(level=0).agg(['first', 'last']).stack() \ .set_index('level_1', append=True).reset_index(1, drop=True) \ .rename_axis([None, None])
per @unutbu: agg(['first', 'last'])
take the firs non-na values.
I interpreted this as, it must then be necessary to run this column by column. Further, forcing index level=1 to align may not even make sense.
Let's include another test
df = pd.DataFrame(np.arange(20).reshape(10, -1), [list('aaaabbbccd'), list('abcdefghij')], list('XY')) df.loc[tuple('aa'), 'X'] = np.nan
def first_last(df): return df.ix[[0, -1]] df.groupby(level=0, group_keys=False).apply(first_last)
df.reset_index(1).groupby(level=0).agg(['first', 'last']).stack() \ .set_index('level_1', append=True).reset_index(1, drop=True) \ .rename_axis([None, None])
Sure enough! This second solution is taking the first valid value in column X. It is now nonsensical to have forced that value to align with the index a.
This could be on of the easy solution.
df.groupby(level = 0, as_index= False).nth([0,-1]) X Y a a 0 1 d 6 7 b e 8 9 g 12 13 c h 14 15 i 16 17 d j 18 19
Hope this helps. (Y)
Please try this:
For last value: df.groupby('Column_name').nth(-1)
,
For first value: df.groupby('Column_name').nth(0)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With