I have a dataframe for values form a file by which I have grouped by two columns, which return a count of the aggregation. Now I want to sort by the max count value, however I get the following error:
KeyError: 'count'
Looks the group by agg count column is some sort of index so not sure how to do this, I'm a beginner to Python and Panda. Here's the actual code, please let me know if you need more detail:
def answer_five(): df = census_df#.set_index(['STNAME']) df = df[df['SUMLEV'] == 50] df = df[['STNAME','CTYNAME']].groupby(['STNAME']).agg(['count']).sort(['count']) #df.set_index(['count']) print(df.index) # get sorted count max item return df.head(5)
Use count() by Column Name Use pandas DataFrame. groupby() to group the rows by column and use count() method to get the count for each group by ignoring None and Nan values. It works with non-floating type data as well.
Pandas DataFrame count() Method The count() method counts the number of not empty values for each row, or column if you specify the axis parameter as axis='columns' , and returns a Series object with the result for each row (or column).
To sort the DataFrame based on the values in a single column, you'll use . sort_values() . By default, this will return a new DataFrame sorted in ascending order. It does not modify the original DataFrame.
I think you need add reset_index
, then parameter ascending=False
to sort_values
because sort
return:
FutureWarning: sort(columns=....) is deprecated, use sort_values(by=.....) .sort_values(['count'], ascending=False)
df = df[['STNAME','CTYNAME']].groupby(['STNAME'])['CTYNAME'] \ .count() \ .reset_index(name='count') \ .sort_values(['count'], ascending=False) \ .head(5)
Sample:
df = pd.DataFrame({'STNAME':list('abscscbcdbcsscae'), 'CTYNAME':[4,5,6,5,6,2,3,4,5,6,4,5,4,3,6,5]}) print (df) CTYNAME STNAME 0 4 a 1 5 b 2 6 s 3 5 c 4 6 s 5 2 c 6 3 b 7 4 c 8 5 d 9 6 b 10 4 c 11 5 s 12 4 s 13 3 c 14 6 a 15 5 e df = df[['STNAME','CTYNAME']].groupby(['STNAME'])['CTYNAME'] \ .count() \ .reset_index(name='count') \ .sort_values(['count'], ascending=False) \ .head(5) print (df) STNAME count 2 c 5 5 s 4 1 b 3 0 a 2 3 d 1
But it seems you need Series.nlargest
:
df = df[['STNAME','CTYNAME']].groupby(['STNAME'])['CTYNAME'].count().nlargest(5)
or:
df = df[['STNAME','CTYNAME']].groupby(['STNAME'])['CTYNAME'].size().nlargest(5)
The difference between
size
andcount
is:
size
countsNaN
values,count
does not.
Sample:
df = pd.DataFrame({'STNAME':list('abscscbcdbcsscae'), 'CTYNAME':[4,5,6,5,6,2,3,4,5,6,4,5,4,3,6,5]}) print (df) CTYNAME STNAME 0 4 a 1 5 b 2 6 s 3 5 c 4 6 s 5 2 c 6 3 b 7 4 c 8 5 d 9 6 b 10 4 c 11 5 s 12 4 s 13 3 c 14 6 a 15 5 e df = df[['STNAME','CTYNAME']].groupby(['STNAME'])['CTYNAME'] .size() .nlargest(5) .reset_index(name='top5') print (df) STNAME top5 0 c 5 1 s 4 2 b 3 3 a 2 4 d 1
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With