I am trying to learn pandas but I have been puzzled with the following. I want to replace NaNs in a DataFrame with the row average. Hence something like <code>df.fillna(df.mean(axis=1))</code> should work but for some reason it fails for me. Am I missing anything, is there something wrong with what I'm doing? Is it because its not implemented? see link here <pre class="prettyprint"><code>import pandas as pd import numpy as np pd.__version__ Out[44]: '0.15.2' In [45]: df = pd.DataFrame() df['c1'] = [1, 2, 3] df['c2'] = [4, 5, 6] df['c3'] = [7, np.nan, 9] df Out[45]: c1 c2 c3 0 1 4 7 1 2 5 NaN 2 3 6 9 In [46]: df.fillna(df.mean(axis=1)) Out[46]: c1 c2 c3 0 1 4 7 1 2 5 NaN 2 3 6 9 </code></pre> However something like this looks to work fine <pre class="prettyprint"><code>df.fillna(df.mean(axis=0)) Out[47]: c1 c2 c3 0 1 4 7 1 2 5 8 2 3 6 9 </code></pre>

As an alternative, you could also use an <code>apply</code> with a <code>lambda</code> expression like this: <pre class="prettyprint"><code>df.apply(lambda row: row.fillna(row.mean()), axis=1) </code></pre> yielding also <pre class="prettyprint"><code> c1 c2 c3 0 1.0 4.0 7.0 1 2.0 5.0 3.5 2 3.0 6.0 9.0 </code></pre>

Pandas Dataframe: Replacing NaN with row average

Tags:

python

pandas

dataframe

missing-data

I am trying to learn pandas but I have been puzzled with the following. I want to replace NaNs in a DataFrame with the row average. Hence something like df.fillna(df.mean(axis=1)) should work but for some reason it fails for me. Am I missing anything, is there something wrong with what I'm doing? Is it because its not implemented? see link here

import pandas as pd
import numpy as np

pd.__version__
Out[44]:
'0.15.2'

In [45]:
df = pd.DataFrame()
df['c1'] = [1, 2, 3]
df['c2'] = [4, 5, 6]
df['c3'] = [7, np.nan, 9]
df

Out[45]:
    c1  c2  c3
0   1   4   7
1   2   5   NaN
2   3   6   9

In [46]:  
df.fillna(df.mean(axis=1)) 

Out[46]:
    c1  c2  c3
0   1   4   7
1   2   5   NaN
2   3   6   9

However something like this looks to work fine

df.fillna(df.mean(axis=0)) 

Out[47]:
    c1  c2  c3
0   1   4   7
1   2   5   8
2   3   6   9

677

asked Oct 10 '15 20:10

Aenaon

2 Answers

As commented the axis argument to fillna is NotImplemented.

df.fillna(df.mean(axis=1), axis=1)

Note: this would be critical here as you don't want to fill in your nth columns with the nth row average.

For now you'll need to iterate through:

m = df.mean(axis=1) for i, col in enumerate(df):     # using i allows for duplicate columns     # inplace *may* not always work here, so IMO the next line is preferred     # df.iloc[:, i].fillna(m, inplace=True)     df.iloc[:, i] = df.iloc[:, i].fillna(m)  print(df)     c1  c2   c3 0   1   4  7.0 1   2   5  3.5 2   3   6  9.0

An alternative is to fillna the transpose and then transpose, which may be more efficient...

df.T.fillna(df.mean(axis=1)).T

141

answered Sep 19 '22 00:09

Andy Hayden

As an alternative, you could also use an apply with a lambda expression like this:

df.apply(lambda row: row.fillna(row.mean()), axis=1)

yielding also

    c1   c2   c3
0  1.0  4.0  7.0
1  2.0  5.0  3.5
2  3.0  6.0  9.0

answered Sep 17 '22 00:09

Cleb

Related questions
                            
                                How floor a date to the first date of that month?
                            
                                How can I concat multiple dataframes in Python? [duplicate]
                            
                                What is LLVM and How is replacing Python VM with LLVM increasing speeds 5x?
                            
                                In Python, how do I reference a class generically in a static way, like PHP's "self" keyword?
                            
                                Disable static file caching in Tornado
                            
                                What does os.path.abspath(os.path.join(os.path.dirname(__file__), os.path.pardir)) mean? python
                            
                                How to make a query date in mongodb using pymongo?
                            
                                How do I create a link to another html page?
                            
                                Saving Matplotlib graphs to image as full screen
                            
                                quickly drop dataframe columns with only one distinct value
                            
                                How to call a function with a dictionary that contains more items than the function has parameters?
                            
                                how to concat two data frames with different column names in pandas? - python
                            
                                Pandas Fillna Mode
                            
                                How can I install pyCurl?
                            
                                How to set initial size for a dictionary in Python?
                            
                                Python Window Activation
                            
                                simple encrypt/decrypt lib in python with private key
                            
                                How to turn sqlalchemy logging off completely
                            
                                Is it possible to automatically break into the debugger when a exception is thrown?
                            
                                How to select columns from groupby object in pandas?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With