Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python Pandas sorting after groupby and aggregate

I am trying to sort data (Pandas) after grouping and aggregating and I am stuck. My data:

data = {'from_year': [2010, 2011, 2012, 2011, 2012, 2010, 2011, 2012],
    'name': ['John', 'John1', 'John', 'John', 'John4', 'John', 'John1', 'John6'],
    'out_days': [11, 8, 10, 15, 11, 6, 10, 4]}
persons = pd.DataFrame(data, columns=["from_year", "name", "out_days"])

days_off_yearly = persons.groupby(["from_year", "name"]).agg({"out_days": [np.sum]})

print(days_off_yearly)

After that I have my data sorted:

                out_days
                     sum
from_year name          
2010      John        17
2011      John        15
          John1       18
2012      John        10
          John4       11
          John6        4

I want to sort my data by from_year and out_days sum and expecting data to be:

                out_days
                     sum
from_year name          
2012      John4       11
          John        10
          John6        4    
2011      John1       18
          John        15
2010      John        17

I am trying

print(days_off_yearly.sort_values(["from_year", ("out_days", "sum")], ascending=False).head(10))

But getting KeyError: 'from_year'.

Any help appreciated.

like image 946
Tomas Rasymas Avatar asked Feb 15 '17 14:02

Tomas Rasymas


People also ask

How do you sort values after Groupby pandas?

To group Pandas dataframe, we use groupby(). To sort grouped dataframe in ascending or descending order, use sort_values(). The size() method is used to get the dataframe size.

How do you sort values after Groupby?

Sort Values in Descending Order with Groupby You can sort values in descending order by using ascending=False param to sort_values() method. The head() function is used to get the first n rows. It is useful for quickly testing if your object has the right type of data in it.

Does pandas Groupby maintain order?

Groupby preserves the order of rows within each group. When calling apply, add group keys to index to identify pieces. Reduce the dimensionality of the return type if possible, otherwise return a consistent type.


1 Answers

You can use sort_values, but first reset_index and then set_index:

#simplier aggregation
days_off_yearly = persons.groupby(["from_year", "name"])['out_days'].sum()
print(days_off_yearly)
from_year  name 
2010       John     17
2011       John     15
           John1    18
2012       John     10
           John4    11
           John6     4
Name: out_days, dtype: int64

print (days_off_yearly.reset_index()
                      .sort_values(['from_year','out_days'],ascending=False)
                      .set_index(['from_year','name']))
                 out_days
from_year name           
2012      John4        11
          John         10
          John6         4
2011      John1        18
          John         15
2010      John         17
like image 198
jezrael Avatar answered Oct 11 '22 13:10

jezrael