Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do you rename all columns in multi level group by in pandas 0.20.1+

Tags:

python

pandas

With the release of Pandas 0.20.1, there is a new deprecation of the functionality to groupby.agg() with a dictionary for renaming.

Deprecation documentation

I'm trying to find best way to update my code to account for this, however I'm struggling with how I've currently been utilizing this rename functionality.

When I am doing an aggregate, I often have multiple functions for each source column, and I have been using this rename functionality to get to a single level index with these new column names.

Example:

df = pd.DataFrame({'A': [1, 1, 1, 2, 2],'B': range(5),'C': range(5)})

In [30]: df
Out[30]: 
   A  B  C
0  1  0  0
1  1  1  1
2  1  2  2
3  2  3  3
4  2  4  4

frame = df.groupby('A').agg({'B' : {'foo':'sum'}, 'C': {'bar' : 'min', 'bar2': 'max'}})

Which results in:

Out[33]: 
    B   C     
  foo bar bar2
A             
1   3   0    2
2   7   3    4

Which I then typically do:

frame = pd.DataFrame(frame).reset_index(col_level=1)

frame.columns = frame.columns.get_level_values(1)

frame
Out[42]: 
   A  foo  bar  bar2
0  1    3    0     2
1  2    7    3     4

So I'm looking for good ways to get a result dataframe that is single level index, but has new unique column names. Where multiple columns originated from an aggregate from a single source column. Any recommendations of best approach is greatly appreciated.

like image 781
Mark Doom Avatar asked May 10 '17 14:05

Mark Doom


People also ask

How do I name all columns in Pandas?

One way of renaming the columns in a Pandas Dataframe is by using the rename() function.

How do I rename multiple columns?

To change multiple column names by name and by index use rename() function of the dplyr package and to rename by just name use setnames() from data. table . From R base functionality, we have colnames() and names() functions that can be used to rename a data frame column by a single index or name.


1 Answers

This works perfectly in 0.20.1 version:

d = {'sum':'foo','min':'bar','max':'bar2'}
frame = df.groupby('A').agg({'B' : ['sum'], 'C': ['min', 'max']}).rename(columns=d)
frame.columns = frame.columns.droplevel(0)
frame = frame.reset_index()
print (frame)
   A  foo  bar  bar2
0  1    3    0     2
1  2    7    3     4

If multiple mins:

d = {'B_sum':'foo','C_min':'bar','C_max':'bar2'}
frame = df.groupby('A').agg({'B' : ['sum'], 'C': ['min', 'max']})
frame.columns = frame.columns.map('_'.join)
frame = frame.reset_index().rename(columns=d)
print (frame)
   A  foo  bar  bar2
0  1    3    0     2
1  2    7    3     4
like image 59
jezrael Avatar answered Oct 20 '22 17:10

jezrael