I have a dataframe which contains nan values at few places. I am trying to perform data cleaning in which I fill the nan values with mean of it's previous five instances. To do so, I have come up with the following. <pre class="prettyprint"><code>input_data_frame[var_list].fillna(input_data_frame[var_list].rolling(5).mean(), inplace=True) </code></pre> But, this is not working. It isn't filling the nan values. There is no change in the dataframe's null count before and after the above operation. Assuming I have a dataframe with just integer column, How can I fill NaN values with mean of the previous five instances? Thanks in advance.

<code>rolling_mean</code> function has been modified in pandas. If you fill the entire dataset, you can use; <pre class="prettyprint"><code>filled_dataset = dataset.fillna(dataset.rolling(6,min_periods=1).mean()) </code></pre>

How to fill nan values with rolling mean in pandas

Tags:

python

pandas

dataframe

nan

mean

I have a dataframe which contains nan values at few places. I am trying to perform data cleaning in which I fill the nan values with mean of it's previous five instances. To do so, I have come up with the following.

input_data_frame[var_list].fillna(input_data_frame[var_list].rolling(5).mean(), inplace=True)

But, this is not working. It isn't filling the nan values. There is no change in the dataframe's null count before and after the above operation. Assuming I have a dataframe with just integer column, How can I fill NaN values with mean of the previous five instances? Thanks in advance.

798

asked Mar 08 '18 12:03

VaM999

2 Answers

rolling_mean function has been modified in pandas. If you fill the entire dataset, you can use;

filled_dataset = dataset.fillna(dataset.rolling(6,min_periods=1).mean())

answered Sep 22 '22 12:09

Caner Erden

This should work:

input_data_frame[var_list]= input_data_frame[var_list].fillna(pd.rolling_mean(input_data_frame[var_list], 6, min_periods=1))

Note that the window is 6 because it includes the value of NaN itself (which is not counted in the average). Also the other NaN values are not used for the averages, so if less that 5 values are found in the window, the average is calculated on the actual values.

Example:

df = {'a': [1, 1,2,3,4,5, np.nan, 1, 1, 2, 3, 4, 5, np.nan] }
df = pd.DataFrame(data=df)
print df

      a
0   1.0
1   1.0
2   2.0
3   3.0
4   4.0
5   5.0
6   NaN
7   1.0
8   1.0
9   2.0
10  3.0
11  4.0
12  5.0
13  NaN

Output:

answered Sep 22 '22 12:09

Joe

Related questions
                            
                                Download all the files in the website
                            
                                Bridging a Python back-end and JavaScript front-end
                            
                                Django Migration Is Failing
                            
                                Why is pandas.DataFrame.apply printing out junk?
                            
                                Is it a good practice to upgrade all python packages in production to their latest versions?
                            
                                What does "name" mean in Django-url? [duplicate]
                            
                                subplot with plotly with multiple traces
                            
                                convert pandas datetime column yyyy-mm-dd to YYYYMMDD
                            
                                Efficiently insert massive amount of rows in Psycopg2
                            
                                Is mro order depth-first or breadth-first?
                            
                                How to get the weight vector in Logistic Regression?
                            
                                How does sample_weight compare to class_weight in scikit-learn?
                            
                                Catch UnicodeDecodeError exception while reading file line by line in Python 3
                            
                                pandas DataFrame.rename unexpected keyword argument "axis" when using mapper
                            
                                pandas dataframe check if index exists in a multi index
                            
                                How to wrap a C++ object using pure Python Extension API (python3)?
                            
                                Is there a built-in function in Python 3 like getchar() in C++?
                            
                                How can I use the output of intermediate layer of one model as input to another model?
                            
                                How to Upload Many Files to Google Colab?
                            
                                Jupyter reports "bad interpreter" following Homebrew Python update

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With