I am trying to learn pandas but I have been puzzled with the following. I want to replace NaNs in a DataFrame with the row average. Hence something like df.fillna(df.mean(axis=1))
should work but for some reason it fails for me. Am I missing anything, is there something wrong with what I'm doing? Is it because its not implemented? see link here
import pandas as pd
import numpy as np
pd.__version__
Out[44]:
'0.15.2'
In [45]:
df = pd.DataFrame()
df['c1'] = [1, 2, 3]
df['c2'] = [4, 5, 6]
df['c3'] = [7, np.nan, 9]
df
Out[45]:
c1 c2 c3
0 1 4 7
1 2 5 NaN
2 3 6 9
In [46]:
df.fillna(df.mean(axis=1))
Out[46]:
c1 c2 c3
0 1 4 7
1 2 5 NaN
2 3 6 9
However something like this looks to work fine
df.fillna(df.mean(axis=0))
Out[47]:
c1 c2 c3
0 1 4 7
1 2 5 8
2 3 6 9
For mean, use the mean() function. Calculate the mean for the column with NaN and use the fillna() to fill the NaN values with the mean.
You can use the fillna() function to replace NaN values in a pandas DataFrame.
By using replace() or fillna() methods you can replace NaN values with Blank/Empty string in Pandas DataFrame. NaN stands for Not A Number and is one of the common ways to represent the missing data value in Python/Pandas DataFrame.
By using dropna() method you can drop rows with NaN (Not a Number) and None values from pandas DataFrame. Note that by default it returns the copy of the DataFrame after removing rows. If you wanted to remove from the existing DataFrame, you should use inplace=True .
As commented the axis argument to fillna is NotImplemented.
df.fillna(df.mean(axis=1), axis=1)
Note: this would be critical here as you don't want to fill in your nth columns with the nth row average.
For now you'll need to iterate through:
m = df.mean(axis=1) for i, col in enumerate(df): # using i allows for duplicate columns # inplace *may* not always work here, so IMO the next line is preferred # df.iloc[:, i].fillna(m, inplace=True) df.iloc[:, i] = df.iloc[:, i].fillna(m) print(df) c1 c2 c3 0 1 4 7.0 1 2 5 3.5 2 3 6 9.0
An alternative is to fillna the transpose and then transpose, which may be more efficient...
df.T.fillna(df.mean(axis=1)).T
As an alternative, you could also use an apply
with a lambda
expression like this:
df.apply(lambda row: row.fillna(row.mean()), axis=1)
yielding also
c1 c2 c3
0 1.0 4.0 7.0
1 2.0 5.0 3.5
2 3.0 6.0 9.0
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With