Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas Dataframe: Replacing NaN with row average

I am trying to learn pandas but I have been puzzled with the following. I want to replace NaNs in a DataFrame with the row average. Hence something like df.fillna(df.mean(axis=1)) should work but for some reason it fails for me. Am I missing anything, is there something wrong with what I'm doing? Is it because its not implemented? see link here

import pandas as pd
import numpy as np
​
pd.__version__
Out[44]:
'0.15.2'

In [45]:
df = pd.DataFrame()
df['c1'] = [1, 2, 3]
df['c2'] = [4, 5, 6]
df['c3'] = [7, np.nan, 9]
df

Out[45]:
    c1  c2  c3
0   1   4   7
1   2   5   NaN
2   3   6   9

In [46]:  
df.fillna(df.mean(axis=1)) 

Out[46]:
    c1  c2  c3
0   1   4   7
1   2   5   NaN
2   3   6   9

However something like this looks to work fine

df.fillna(df.mean(axis=0)) 

Out[47]:
    c1  c2  c3
0   1   4   7
1   2   5   8
2   3   6   9
like image 677
Aenaon Avatar asked Oct 10 '15 20:10

Aenaon


People also ask

How do you replace NaN with average?

For mean, use the mean() function. Calculate the mean for the column with NaN and use the fillna() to fill the NaN values with the mean.

How do you replace all NaN values with mean in pandas?

You can use the fillna() function to replace NaN values in a pandas DataFrame.

What can I replace NaN with?

By using replace() or fillna() methods you can replace NaN values with Blank/Empty string in Pandas DataFrame. NaN stands for Not A Number and is one of the common ways to represent the missing data value in Python/Pandas DataFrame.

How do I get rid of NaN in pandas?

By using dropna() method you can drop rows with NaN (Not a Number) and None values from pandas DataFrame. Note that by default it returns the copy of the DataFrame after removing rows. If you wanted to remove from the existing DataFrame, you should use inplace=True .


2 Answers

As commented the axis argument to fillna is NotImplemented.

df.fillna(df.mean(axis=1), axis=1) 

Note: this would be critical here as you don't want to fill in your nth columns with the nth row average.

For now you'll need to iterate through:

m = df.mean(axis=1) for i, col in enumerate(df):     # using i allows for duplicate columns     # inplace *may* not always work here, so IMO the next line is preferred     # df.iloc[:, i].fillna(m, inplace=True)     df.iloc[:, i] = df.iloc[:, i].fillna(m)  print(df)     c1  c2   c3 0   1   4  7.0 1   2   5  3.5 2   3   6  9.0 

An alternative is to fillna the transpose and then transpose, which may be more efficient...

df.T.fillna(df.mean(axis=1)).T 
like image 141
Andy Hayden Avatar answered Sep 19 '22 00:09

Andy Hayden


As an alternative, you could also use an apply with a lambda expression like this:

df.apply(lambda row: row.fillna(row.mean()), axis=1)

yielding also

    c1   c2   c3
0  1.0  4.0  7.0
1  2.0  5.0  3.5
2  3.0  6.0  9.0
like image 31
Cleb Avatar answered Sep 17 '22 00:09

Cleb