Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pandas filling nans by mean of before and after non-nan values

Tags:

I would like to fill df's nan with an average of adjacent elements.

Consider a dataframe:

df = pd.DataFrame({'val': [1,np.nan, 4, 5, np.nan, 10, 1,2,5, np.nan, np.nan, 9]})     val 0   1.0 1   NaN 2   4.0 3   5.0 4   NaN 5   10.0 6   1.0 7   2.0 8   5.0 9   NaN 10  NaN 11  9.0 

My desired output is:

    val 0   1.0 1   2.5 2   4.0 3   5.0 4   7.5 5   10.0 6   1.0 7   2.0 8   5.0 9   7.0 <<< deadend 10  7.0 <<< deadend 11  9.0 

I've looked into other solutions such as Fill cell containing NaN with average of value before and after, but this won't work in case of two or more consecutive np.nans.

Any help is greatly appreciated!

like image 838
Chris Avatar asked Jan 29 '19 05:01

Chris


People also ask

Does mean in pandas ignore NaN?

pandas mean() Key PointsBy default ignore NaN values and performs mean on index axis.

What function can be used to fill each NA value using the previous value in the data frame?

ffill() function is used to fill the missing value in the dataframe.

How do you replace NaN with average?

For mean, use the mean() function. Calculate the mean for the column with NaN and use the fillna() to fill the NaN values with the mean.


2 Answers

Use ffill + bfill and divide by 2:

df = (df.ffill()+df.bfill())/2  print(df)      val 0    1.0 1    2.5 2    4.0 3    5.0 4    7.5 5   10.0 6    1.0 7    2.0 8    5.0 9    7.0 10   7.0 11   9.0 

EDIT : If 1st and last element contains NaN then use (Dark suggestion):

df = pd.DataFrame({'val':[np.nan,1,np.nan, 4, 5, np.nan,                            10, 1,2,5, np.nan, np.nan, 9,np.nan,]}) df = (df.ffill()+df.bfill())/2 df = df.bfill().ffill()  print(df)      val 0    1.0 1    1.0 2    2.5 3    4.0 4    5.0 5    7.5 6   10.0 7    1.0 8    2.0 9    5.0 10   7.0 11   7.0 12   9.0 13   9.0 
like image 58
Space Impact Avatar answered Sep 30 '22 19:09

Space Impact


Althogh in case of multiple nan's in a row it doesn't produce the exact output you specified, other users reaching this page may actually prefer the effect of the method interpolate():

df = df.interpolate()  print(df)      val 0    1.0 1    2.5 2    4.0 3    5.0 4    7.5 5   10.0 6    1.0 7    2.0 8    5.0 9    6.3 10   7.7 11   9.0 
like image 20
matthme Avatar answered Sep 30 '22 21:09

matthme