I have a dataframe like this:
Index STNAME COUNTY COUNTY_POP
0 AL 0 100
1 AL 1 150
2 AL 3 200
3 AL 5 50
...
15 CA 0 300
16 CA 1 200
17 CA 3 250
18 CA 4 350
I want to sum the three largest integers from COUNTY_POP for each state. So far, I have:
In[]: df.groupby(['STNAME'])['COUNTY_POP'].nlargest(3)
Out[]:
Index STNAME COUNTY COUNTY_POP
0 AL 0 100
1 AL 1 150
2 AL 3 200
...
15 CA 0 300
17 CA 3 250
18 CA 4 350
However when I add the .sum() operation to the above code, I receive the following output.
In[]: df.groupby(['STNAME'])['COUNTY_POP'].nlargest(3).sum()
Out[]:
1350
I'm relatively new to Python and Pandas. If anyone could explain what causes this and how to correct it, I'd really appreciate it!
Is that what you want?
In [25]: df.groupby('STNAME')['COUNTY_POP'].agg(lambda x: x.nlargest(3).sum())
Out[25]:
STNAME
AL 450
CA 900
Name: COUNTY_POP, dtype: int64
presort and slice... a tad faster
df.sort_values('COUNTY_POP').groupby('STNAME').COUNTY_POP \
.apply(lambda x: x.values[-3:].sum())
STNAME
AL 450
CA 900
Name: COUNTY_POP, dtype: int64
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With