I'm posting this because the topic just got brought up in another question/answer and the behavior isn't very well documented.
Consider the dataframe df
df = pd.DataFrame(dict(
A=list('xxxyyy'),
B=[np.nan, 1, 2, 3, 4, np.nan]
))
A B
0 x NaN
1 x 1.0
2 x 2.0
3 y 3.0
4 y 4.0
5 y NaN
I wanted to get the first and last rows of each group defined by column 'A'
.
I tried
df.groupby('A').B.agg(['first', 'last'])
first last
A
x 1.0 2.0
y 3.0 4.0
However, This doesn't give me the np.NaN
s that I expected.
How do I get the actual first and last values in each group?
Groupby preserves the order of rows within each group. Unfortunately the answer to this question is NO.
Note this does not influence the order of observations within each group. Groupby preserves the order of rows within each group.
groupby. nth() function is used to get the value corresponding the nth row for each group. To get the first value in a group, pass 0 as an argument to the nth() function.
Sort Values in Descending Order with Groupby You can sort values in descending order by using ascending=False param to sort_values() method. The head() function is used to get the first n rows. It is useful for quickly testing if your object has the right type of data in it. Yields below output.
As noted here by @unutbu:
The groupby.first and groupby.last methods return the first and last non-null values respectively.
To get the actual first and last values, do:
def h(x):
return x.values[0]
def t(x):
return x.values[-1]
df.groupby('A').B.agg([h, t])
h t
A
x NaN 2.0
y 3.0 NaN
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With