Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Count and Sort with Pandas

I have a dataframe for values form a file by which I have grouped by two columns, which return a count of the aggregation. Now I want to sort by the max count value, however I get the following error:

KeyError: 'count'

Looks the group by agg count column is some sort of index so not sure how to do this, I'm a beginner to Python and Panda. Here's the actual code, please let me know if you need more detail:

def answer_five():     df = census_df#.set_index(['STNAME'])     df = df[df['SUMLEV'] == 50]     df = df[['STNAME','CTYNAME']].groupby(['STNAME']).agg(['count']).sort(['count'])     #df.set_index(['count'])     print(df.index)     # get sorted count max item     return df.head(5) 
like image 714
Rubans Avatar asked Nov 06 '16 20:11

Rubans


People also ask

How do you do Groupby and count in pandas?

Use count() by Column Name Use pandas DataFrame. groupby() to group the rows by column and use count() method to get the count for each group by ignoring None and Nan values. It works with non-floating type data as well.

Is there a count function in pandas?

Pandas DataFrame count() Method The count() method counts the number of not empty values for each row, or column if you specify the axis parameter as axis='columns' , and returns a Series object with the result for each row (or column).

How do you sort with pandas?

To sort the DataFrame based on the values in a single column, you'll use . sort_values() . By default, this will return a new DataFrame sorted in ascending order. It does not modify the original DataFrame.


1 Answers

I think you need add reset_index, then parameter ascending=False to sort_values because sort return:

FutureWarning: sort(columns=....) is deprecated, use sort_values(by=.....) .sort_values(['count'], ascending=False)

df = df[['STNAME','CTYNAME']].groupby(['STNAME'])['CTYNAME'] \                              .count() \                              .reset_index(name='count') \                              .sort_values(['count'], ascending=False) \                              .head(5) 

Sample:

df = pd.DataFrame({'STNAME':list('abscscbcdbcsscae'),                    'CTYNAME':[4,5,6,5,6,2,3,4,5,6,4,5,4,3,6,5]})  print (df)     CTYNAME STNAME 0         4      a 1         5      b 2         6      s 3         5      c 4         6      s 5         2      c 6         3      b 7         4      c 8         5      d 9         6      b 10        4      c 11        5      s 12        4      s 13        3      c 14        6      a 15        5      e  df = df[['STNAME','CTYNAME']].groupby(['STNAME'])['CTYNAME'] \                              .count() \                              .reset_index(name='count') \                              .sort_values(['count'], ascending=False) \                              .head(5)  print (df)   STNAME  count 2      c      5 5      s      4 1      b      3 0      a      2 3      d      1 

But it seems you need Series.nlargest:

df = df[['STNAME','CTYNAME']].groupby(['STNAME'])['CTYNAME'].count().nlargest(5) 

or:

df = df[['STNAME','CTYNAME']].groupby(['STNAME'])['CTYNAME'].size().nlargest(5) 

The difference between size and count is:

size counts NaN values, count does not.

Sample:

df = pd.DataFrame({'STNAME':list('abscscbcdbcsscae'),                    'CTYNAME':[4,5,6,5,6,2,3,4,5,6,4,5,4,3,6,5]})  print (df)     CTYNAME STNAME 0         4      a 1         5      b 2         6      s 3         5      c 4         6      s 5         2      c 6         3      b 7         4      c 8         5      d 9         6      b 10        4      c 11        5      s 12        4      s 13        3      c 14        6      a 15        5      e  df = df[['STNAME','CTYNAME']].groupby(['STNAME'])['CTYNAME']                              .size()                              .nlargest(5)                              .reset_index(name='top5') print (df)   STNAME  top5 0      c     5 1      s     4 2      b     3 3      a     2 4      d     1 
like image 143
jezrael Avatar answered Sep 18 '22 08:09

jezrael