I would like to fill <code>df</code>'s <code>nan</code> with an average of adjacent elements. Consider a dataframe: <pre class="prettyprint"><code>df = pd.DataFrame({'val': [1,np.nan, 4, 5, np.nan, 10, 1,2,5, np.nan, np.nan, 9]}) val 0 1.0 1 NaN 2 4.0 3 5.0 4 NaN 5 10.0 6 1.0 7 2.0 8 5.0 9 NaN 10 NaN 11 9.0 </code></pre> My desired output is: <pre class="prettyprint"><code> val 0 1.0 1 2.5 2 4.0 3 5.0 4 7.5 5 10.0 6 1.0 7 2.0 8 5.0 9 7.0 <<< deadend 10 7.0 <<< deadend 11 9.0 </code></pre> I've looked into other solutions such as Fill cell containing NaN with average of value before and after, but this won't work in case of two or more consecutive <code>np.nan</code>s. Any help is greatly appreciated!

Use <code>ffill</code> + <code>bfill</code> and divide by 2: <pre class="prettyprint"><code>df = (df.ffill()+df.bfill())/2 print(df) val 0 1.0 1 2.5 2 4.0 3 5.0 4 7.5 5 10.0 6 1.0 7 2.0 8 5.0 9 7.0 10 7.0 11 9.0 </code></pre> EDIT : If 1st and last element contains <code>NaN</code> then use (<code>Dark</code> suggestion): <pre class="prettyprint"><code>df = pd.DataFrame({'val':[np.nan,1,np.nan, 4, 5, np.nan, 10, 1,2,5, np.nan, np.nan, 9,np.nan,]}) df = (df.ffill()+df.bfill())/2 df = df.bfill().ffill() print(df) val 0 1.0 1 1.0 2 2.5 3 4.0 4 5.0 5 7.5 6 10.0 7 1.0 8 2.0 9 5.0 10 7.0 11 7.0 12 9.0 13 9.0 </code></pre>

Althogh in case of multiple <code>nan</code>'s in a row it doesn't produce the exact output you specified, other users reaching this page may actually prefer the effect of the method <code>interpolate()</code>: <pre class="prettyprint"><code>df = df.interpolate() print(df) val 0 1.0 1 2.5 2 4.0 3 5.0 4 7.5 5 10.0 6 1.0 7 2.0 8 5.0 9 6.3 10 7.7 11 9.0 </code></pre>

pandas filling nans by mean of before and after non-nan values

Tags:

I would like to fill df's nan with an average of adjacent elements.

Consider a dataframe:

df = pd.DataFrame({'val': [1,np.nan, 4, 5, np.nan, 10, 1,2,5, np.nan, np.nan, 9]})     val 0   1.0 1   NaN 2   4.0 3   5.0 4   NaN 5   10.0 6   1.0 7   2.0 8   5.0 9   NaN 10  NaN 11  9.0

My desired output is:

    val 0   1.0 1   2.5 2   4.0 3   5.0 4   7.5 5   10.0 6   1.0 7   2.0 8   5.0 9   7.0 <<< deadend 10  7.0 <<< deadend 11  9.0

I've looked into other solutions such as Fill cell containing NaN with average of value before and after, but this won't work in case of two or more consecutive np.nans.

Any help is greatly appreciated!

838

asked Jan 29 '19 05:01

Chris

2 Answers

Use ffill + bfill and divide by 2:

df = (df.ffill()+df.bfill())/2  print(df)      val 0    1.0 1    2.5 2    4.0 3    5.0 4    7.5 5   10.0 6    1.0 7    2.0 8    5.0 9    7.0 10   7.0 11   9.0

EDIT : If 1st and last element contains NaN then use (Dark suggestion):

df = pd.DataFrame({'val':[np.nan,1,np.nan, 4, 5, np.nan,                            10, 1,2,5, np.nan, np.nan, 9,np.nan,]}) df = (df.ffill()+df.bfill())/2 df = df.bfill().ffill()  print(df)      val 0    1.0 1    1.0 2    2.5 3    4.0 4    5.0 5    7.5 6   10.0 7    1.0 8    2.0 9    5.0 10   7.0 11   7.0 12   9.0 13   9.0

answered Sep 30 '22 19:09

Space Impact

Althogh in case of multiple nan's in a row it doesn't produce the exact output you specified, other users reaching this page may actually prefer the effect of the method interpolate():

df = df.interpolate()  print(df)      val 0    1.0 1    2.5 2    4.0 3    5.0 4    7.5 5   10.0 6    1.0 7    2.0 8    5.0 9    6.3 10   7.7 11   9.0

answered Sep 30 '22 21:09

matthme

Related questions
                            
                                How to create color shades using CSS variables similar to darken() of SASS?
                            
                                npm WARN: npm does not support Node.js v12.4.0
                            
                                How to use firebase emulators pubsub to test timed functions locally?
                            
                                Different results between c++ and c# sin function with large values
                            
                                How to make a pipe loop in bash
                            
                                Is the hash of a GUID unique?
                            
                                How does your favorite language handle deep recursion? [closed]
                            
                                How can I add my attributes to Code-Generated Linq2Sql classes properties?
                            
                                How to append to a text field in t-sql SQL Server 2005
                            
                                Calculating the shortest distance between two lines (line segments) in 3D
                            
                                How to lock on an integer in C#?
                            
                                Recommended replacement for deprecated call_user_method?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With