Pandas: Mean of columns with the same names

Tags:

pandas

I have a dataframe with columns like:

['id','name','foo1', 'foo1', 'foo1', 'foo2','foo2', 'foo3']

I would like to get a new dataframe where columns sharing the same name are averaged:

['id','name','foo1', 'foo2','foo3']

Here column foo1 would be the average of the three columns named foo1 in the original dataframe, foo2 would be the average of the two columns named foo2 and foo3 would be just foo3

Note: id and name are not numeric and I have to keep them.

617

asked Oct 28 '16 19:10

1 Answers

The basic idea is that you can group by your columns names and do mean operations for each group.

I saw some comments for your question and tried to give you different ways to achieve the goal. (Solution (3) is the best I found!)

(1) Quick solution. If you have very limited columns that are non-numeric, and own unique names, e.g., columns id and name. What you can do is:

First set index ['id', 'name'] to preserve them,

df = df.set_index(['id', 'name'])

then use DataFrame.groupby function on columns, set axis=1 (iterate over each column), apply mean function for each group.

df.groupby(by=df.columns, axis=1).mean()

And finally, reset index to recover ['id', 'name'] columns

df = df.reset_index()

Here is a sample code:

In [35]: df = pd.DataFrame([['001', 'a', 1, 10, 100, 1000], ['002', 'b', 2, 20, 200, 2000]], columns=['id', 'name', 'c1', 'c2', 'c2', 'c3'], index=list('AB'))

In [36]: df = df.set_index(['id', 'name'])

In [37]: df = df.groupby(by=df.columns, axis=1).mean()

In [38]: df = df.reset_index()

In [39]: df
Out[39]: 
    id name  c1   c2    c3
0  001    a   1   55  1000
1  002    b   2  110  2000

(2) Complete solution. If you have lots of columns that are non-numeric and unique named, what you can do is:

First transpose you dataframe,

df2 = df.transpose()

Then you do group by operations (on its index and axis=0), but carefully handle each groups: for these numeric groups, return their mean value; and for these non-numeric groups, return their first row:

df2 = df2.groupby(by=df2.index, axis=0).apply(lambda g: g.mean() if isinstance(g.iloc[0,0], numbers.Number) else g.iloc[0])

And finally, transpose back:

df = df2.transpose()

Here is sample of code:

In [98]: df = pd.DataFrame([['001', 'a', 1, 10, 100, 1000], ['002', 'b', 2, 20, 200, 2000]], columns=['id', 'name', 'c1', 'c2', 'c2', 'c3'], index=list('AB'))

In [99]: df2 = df.transpose()

In [100]: df2 = df2.groupby(by=df2.index, axis=0).apply(lambda g: g.mean() if isinstance(g.iloc[0,0], numbers.Number) else g.iloc[0])

In [101]: df3 = df2.transpose()

In [102]: df3
Out[102]: 
  c1   c2    c3   id name
A  1   55  1000  001    a
B  2  110  2000  002    b

In [103]: df
Out[103]: 
    id name  c1  c2   c2    c3
A  001    a   1  10  100  1000
B  002    b   2  20  200  2000

You need to import numbers

More notes:

(3) All in one! This solution is the best I found:

df.groupby(by=df.columns, axis=1).apply(lambda g: g.mean(axis=1) if isinstance(g.iloc[0,0], numbers.Number) else g.iloc[:,0])

I tried to handle each group for the un-transposed groups, that is,

df.groupby(by=df.columns, axis=1).apply(gf)

And

gf = lambda g: g.mean(axis=1) if isinstance(g.iloc[0,0], numbers.Number) else g.iloc[:,0]

I failed before, because I do not carefully hand the axis. You must set axis=1 for mean function, and return columns for non-numeric groups.

Thanks!

173

answered Oct 25 '22 14:10

rojeeer

Related questions
                            
                                What is the purpose of response time distribution in locust.io?
                            
                                Installing a .whl Python package into a specific directory other than the default
                            
                                How to process RDDs using a Python class?
                            
                                pip doesn't work after upgrade
                            
                                Inheriting a patched class
                            
                                How to use PyMongo with Flask Blueprints?
                            
                                Example program of Cython as Python to C Converter
                            
                                How to use Tensorflow Optimizer without recomputing activations in reinforcement learning program that returns control after each iteration?
                            
                                Add missing date index in dataframe
                            
                                Python Pandas removing substring using another column
                            
                                Find elements that occur in some but not all lists
                            
                                Python 3 won't run from the Git Bash command line [duplicate]
                            
                                Disable warnings while pip installing packages
                            
                                Difference between sphinxcontrib.napoleon and numpy.numpydoc [closed]
                            
                                PyCharm - Auto Completion for matplotlib (and other imported modules)
                            
                                How to check if celery result backend is working
                            
                                logistic / sigmoid function implementation numerical precision
                            
                                simply use python anaconda without internet connection
                            
                                Identifying consecutive occurrences of a value in a column of a pandas DataFrame
                            
                                Why does Python 3 exec() fail when specifying locals?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Pandas: Mean of columns with the same names

Tags:

python

pandas

user3635284

People also ask

1 Answers

rojeeer

Recent Activity

Donate For Us