I would like to fill df
's nan
with an average of adjacent elements.
Consider a dataframe:
df = pd.DataFrame({'val': [1,np.nan, 4, 5, np.nan, 10, 1,2,5, np.nan, np.nan, 9]}) val 0 1.0 1 NaN 2 4.0 3 5.0 4 NaN 5 10.0 6 1.0 7 2.0 8 5.0 9 NaN 10 NaN 11 9.0
My desired output is:
val 0 1.0 1 2.5 2 4.0 3 5.0 4 7.5 5 10.0 6 1.0 7 2.0 8 5.0 9 7.0 <<< deadend 10 7.0 <<< deadend 11 9.0
I've looked into other solutions such as Fill cell containing NaN with average of value before and after, but this won't work in case of two or more consecutive np.nan
s.
Any help is greatly appreciated!
pandas mean() Key PointsBy default ignore NaN values and performs mean on index axis.
ffill() function is used to fill the missing value in the dataframe.
For mean, use the mean() function. Calculate the mean for the column with NaN and use the fillna() to fill the NaN values with the mean.
Use ffill
+ bfill
and divide by 2:
df = (df.ffill()+df.bfill())/2 print(df) val 0 1.0 1 2.5 2 4.0 3 5.0 4 7.5 5 10.0 6 1.0 7 2.0 8 5.0 9 7.0 10 7.0 11 9.0
EDIT : If 1st and last element contains NaN
then use (Dark
suggestion):
df = pd.DataFrame({'val':[np.nan,1,np.nan, 4, 5, np.nan, 10, 1,2,5, np.nan, np.nan, 9,np.nan,]}) df = (df.ffill()+df.bfill())/2 df = df.bfill().ffill() print(df) val 0 1.0 1 1.0 2 2.5 3 4.0 4 5.0 5 7.5 6 10.0 7 1.0 8 2.0 9 5.0 10 7.0 11 7.0 12 9.0 13 9.0
Althogh in case of multiple nan
's in a row it doesn't produce the exact output you specified, other users reaching this page may actually prefer the effect of the method interpolate()
:
df = df.interpolate() print(df) val 0 1.0 1 2.5 2 4.0 3 5.0 4 7.5 5 10.0 6 1.0 7 2.0 8 5.0 9 6.3 10 7.7 11 9.0
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With