Renaming Column Names in Pandas Groupby function [duplicate]

Q1) I want to do a groupby, SQL-style aggregation and rename the output column:

Example dataset:

>>> df     ID     Region  count 0  100       Asia      2 1  101     Europe      3 2  102         US      1 3  103     Africa      5 4  100     Russia      5 5  101  Australia      7 6  102         US      8 7  104       Asia     10 8  105     Europe     11 9  110     Africa     23

I want to group the observations of this dataset by ID and Region and summing the count for each group. So I used something like this...

>>> print(df.groupby(['ID','Region'],as_index=False).count().sum())      ID     Region  count 0  100       Asia      2 1  100     Russia      5 2  101  Australia      7 3  101     Europe      3 4  102         US      9 5  103     Africa      5 6  104       Asia     10 7  105     Europe     11 8  110     Africa     23

On using as_index=False I am able to get "SQL-Like" output. My problem is that I am unable to rename the aggregate variable count here. So in SQL if wanted to do the above thing I would do something like this:

select ID, Region, sum(count) as Total_Numbers from df group by ID, Region order by ID, Region

As we see, it's very easy for me to rename the aggregate variable count to Total_Numbers in SQL. I wanted to do the same thing in Pandas but unable to find such an option in group-by function. Can somebody help?

The second question (more of an observation) is whether...

Q2) Is it possible to directly use column names in Pandas dataframe functions without enclosing them in quotes?

I understand that the variable names are strings, so have to be inside quotes, but I see if use them outside dataframe function and as an attribute we don't require them to be inside quotes. Like df.ID.sum() etc. It's only when we use it in a DataFrame function like df.sort() or df.groupby we have to use it inside quotes. This is actually a bit of pain as in SQL or in SAS or other languages we simply use the variable name without quoting them. Any suggestion on this?

Kindly reply to both questions (Q1 is the main, Q2 more of an opinion).

764

asked Oct 22 '13 16:10

Baktaawar

2 Answers

For the first question I think answer would be:

<your DataFrame>.rename(columns={'count':'Total_Numbers'})

<your DataFrame>.columns = ['ID', 'Region', 'Total_Numbers']

As for second one I'd say the answer would be no. It's possible to use it like 'df.ID' because of python datamodel:

Attribute references are translated to lookups in this dictionary, e.g., m.x is equivalent to m.dict["x"]

182

answered Sep 17 '22 17:09

Roman Pekar

The current (as of version 0.20) method for changing column names after a groupby operation is to chain the rename method. See this deprecation note in the documentation for more detail.

Deprecated Answer as of pandas version 0.20

This is the first result in google and although the top answer works it does not really answer the question. There is a better answer here and a long discussion on github about the full functionality of passing dictionaries to the agg method.

These answers unfortunately do not exist in the documentation but the general format for grouping, aggregating and then renaming columns uses a dictionary of dictionaries. The keys to the outer dictionary are column names that are to be aggregated. The inner dictionaries have keys that the new column names with values as the aggregating function.

Before we get there, let's create a four column DataFrame.

df = pd.DataFrame({'A' : list('wwwwxxxx'),                     'B':list('yyzzyyzz'),                     'C':np.random.rand(8),                     'D':np.random.rand(8)})     A  B         C         D 0  w  y  0.643784  0.828486 1  w  y  0.308682  0.994078 2  w  z  0.518000  0.725663 3  w  z  0.486656  0.259547 4  x  y  0.089913  0.238452 5  x  y  0.688177  0.753107 6  x  z  0.955035  0.462677 7  x  z  0.892066  0.368850

Let's say we want to group by columns A, B and aggregate column C with mean and median and aggregate column D with max. The following code would do this.

df.groupby(['A', 'B']).agg({'C':['mean', 'median'], 'D':'max'})              D         C                     max      mean    median A B                               w y  0.994078  0.476233  0.476233   z  0.725663  0.502328  0.502328 x y  0.753107  0.389045  0.389045   z  0.462677  0.923551  0.923551

This returns a DataFrame with a hierarchical index. The original question asked about renaming the columns in the same step. This is possible using a dictionary of dictionaries:

df.groupby(['A', 'B']).agg({'C':{'C_mean': 'mean', 'C_median': 'median'},                              'D':{'D_max': 'max'}})              D         C                   D_max    C_mean  C_median A B                               w y  0.994078  0.476233  0.476233   z  0.725663  0.502328  0.502328 x y  0.753107  0.389045  0.389045   z  0.462677  0.923551  0.923551

This renames the columns all in one go but still leaves the hierarchical index which the top level can be dropped with df.columns = df.columns.droplevel(0).

answered Sep 18 '22 17:09

Ted Petrou

Related questions
                            
                                How do I write a null (no-op) contextmanager in Python?
                            
                                How to return custom JSON in Django REST Framework
                            
                                How to yield results from a nested generator function?
                            
                                What exactly are "containers" in python? (And what are all the python container types?)
                            
                                Skip over a value in the range function in python
                            
                                Python: Wait on all of `concurrent.futures.ThreadPoolExecutor`'s futures
                            
                                How to make an object properly hashable?
                            
                                A better way for a Python 'for' loop
                            
                                Add another tuple to a tuple of tuples
                            
                                In Django models.py, what's the difference between default, null, and blank?
                            
                                open file in "w" mode: IOError: [Errno 2] No such file or directory
                            
                                How does python find a module file if the import statement only contains the filename?
                            
                                multiprocessing: map vs map_async
                            
                                How is HDF5 different from a folder with files?
                            
                                Why does Python assignment not return a value?
                            
                                How to install R packages that are not available in "R-essentials"?
                            
                                Pycharm: "scanning files to index" is taking forever
                            
                                Zipped Python generators with 2nd one being shorter: how to retrieve element that is silently consumed
                            
                                Python regex matching Unicode properties
                            
                                Applying a function to values in dict

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Renaming Column Names in Pandas Groupby function [duplicate]

Tags:

python

pandas

rename

group-by

pandas-groupby