Count and Sort with Pandas

Tags:

I have a dataframe for values form a file by which I have grouped by two columns, which return a count of the aggregation. Now I want to sort by the max count value, however I get the following error:

KeyError: 'count'

Looks the group by agg count column is some sort of index so not sure how to do this, I'm a beginner to Python and Panda. Here's the actual code, please let me know if you need more detail:

def answer_five():     df = census_df#.set_index(['STNAME'])     df = df[df['SUMLEV'] == 50]     df = df[['STNAME','CTYNAME']].groupby(['STNAME']).agg(['count']).sort(['count'])     #df.set_index(['count'])     print(df.index)     # get sorted count max item     return df.head(5)

714

asked Nov 06 '16 20:11

Rubans

1 Answers

I think you need add reset_index, then parameter ascending=False to sort_values because sort return:

FutureWarning: sort(columns=....) is deprecated, use sort_values(by=.....) .sort_values(['count'], ascending=False)

df = df[['STNAME','CTYNAME']].groupby(['STNAME'])['CTYNAME'] \                              .count() \                              .reset_index(name='count') \                              .sort_values(['count'], ascending=False) \                              .head(5)

Sample:

df = pd.DataFrame({'STNAME':list('abscscbcdbcsscae'),                    'CTYNAME':[4,5,6,5,6,2,3,4,5,6,4,5,4,3,6,5]})  print (df)     CTYNAME STNAME 0         4      a 1         5      b 2         6      s 3         5      c 4         6      s 5         2      c 6         3      b 7         4      c 8         5      d 9         6      b 10        4      c 11        5      s 12        4      s 13        3      c 14        6      a 15        5      e  df = df[['STNAME','CTYNAME']].groupby(['STNAME'])['CTYNAME'] \                              .count() \                              .reset_index(name='count') \                              .sort_values(['count'], ascending=False) \                              .head(5)  print (df)   STNAME  count 2      c      5 5      s      4 1      b      3 0      a      2 3      d      1

But it seems you need Series.nlargest:

df = df[['STNAME','CTYNAME']].groupby(['STNAME'])['CTYNAME'].count().nlargest(5)

or:

df = df[['STNAME','CTYNAME']].groupby(['STNAME'])['CTYNAME'].size().nlargest(5)

The difference between size and count is:

size counts NaN values, count does not.

Sample:

df = pd.DataFrame({'STNAME':list('abscscbcdbcsscae'),                    'CTYNAME':[4,5,6,5,6,2,3,4,5,6,4,5,4,3,6,5]})  print (df)     CTYNAME STNAME 0         4      a 1         5      b 2         6      s 3         5      c 4         6      s 5         2      c 6         3      b 7         4      c 8         5      d 9         6      b 10        4      c 11        5      s 12        4      s 13        3      c 14        6      a 15        5      e  df = df[['STNAME','CTYNAME']].groupby(['STNAME'])['CTYNAME']                              .size()                              .nlargest(5)                              .reset_index(name='top5') print (df)   STNAME  top5 0      c     5 1      s     4 2      b     3 3      a     2 4      d     1

143

answered Sep 18 '22 08:09

jezrael

Related questions
                            
                                Parsing date with timezone from an email?
                            
                                How to add third-party Java JAR files for use in PySpark
                            
                                python/pandas: convert month int to month name
                            
                                How to explain the int() function to a beginner
                            
                                sort csv by column
                            
                                usleep in Python
                            
                                networkx add_node with specific position
                            
                                How to install SimpleJson Package for Python
                            
                                How do I subtract two dates in Django/Python?
                            
                                How do you set a conditional in python based on datatypes?
                            
                                Writing UTF-8 String to MySQL with Python
                            
                                Bottle framework and OOP, using method instead of function
                            
                                Python - Download Images from google Image search?
                            
                                Running a Python script outside of Django
                            
                                differences between "d = dict()" and "d = {}"
                            
                                Possible to append multiple lists at once? (Python)
                            
                                Convert percent string to float in pandas read_csv
                            
                                In Python, is it better to use list comprehensions or for-each loops?
                            
                                Find the root of the git repository where the file lives
                            
                                python requests module and connection reuse

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Count and Sort with Pandas

Tags:

python

sorting

pandas

count

group-by

Rubans

People also ask

1 Answers

jezrael

Recent Activity

Donate For Us