Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to fill nan values with rolling mean in pandas

I have a dataframe which contains nan values at few places. I am trying to perform data cleaning in which I fill the nan values with mean of it's previous five instances. To do so, I have come up with the following.

input_data_frame[var_list].fillna(input_data_frame[var_list].rolling(5).mean(), inplace=True)

But, this is not working. It isn't filling the nan values. There is no change in the dataframe's null count before and after the above operation. Assuming I have a dataframe with just integer column, How can I fill NaN values with mean of the previous five instances? Thanks in advance.

like image 798
VaM999 Avatar asked Mar 08 '18 12:03

VaM999


People also ask

How to replace NaN values in a pandas Dataframe with mean values?

You can use the fillna () function to replace NaN values in a pandas DataFrame. The following code shows how to fill the NaN values in the rating column with the mean value of the rating column: The mean value in the rating column was 85.125 so each of the NaN values in the rating column were filled with this value.

How do I change the Nan of a column in Excel?

Now with the help of fillna () function we will change all ‘NaN’ of that particular column for which we have its mean. We will print the updated column. Syntax: df.fillna (value=None, method=None, axis=None, inplace=False, limit=None, downcast=None, **kwargs)

What is the use of Nan in MySQL?

It replaces missing values with the most frequent ones in that column. Let’s see an example of replacing NaN values of “Color” column –

How to avoid returning NaN in a rolling window?

If you would like to avoid returning NaN, you could pass min_periods=1 to the method which reduces the minimum required number of valid observations in the window to 1 instead of 2: Show activity on this post. Using min_periods=1 can lead to high variance for the values in the rolling window.


2 Answers

rolling_mean function has been modified in pandas. If you fill the entire dataset, you can use;

filled_dataset = dataset.fillna(dataset.rolling(6,min_periods=1).mean())
like image 53
Caner Erden Avatar answered Sep 22 '22 12:09

Caner Erden


This should work:

input_data_frame[var_list]= input_data_frame[var_list].fillna(pd.rolling_mean(input_data_frame[var_list], 6, min_periods=1))

Note that the window is 6 because it includes the value of NaN itself (which is not counted in the average). Also the other NaN values are not used for the averages, so if less that 5 values are found in the window, the average is calculated on the actual values.

Example:

df = {'a': [1, 1,2,3,4,5, np.nan, 1, 1, 2, 3, 4, 5, np.nan] }
df = pd.DataFrame(data=df)
print df

      a
0   1.0
1   1.0
2   2.0
3   3.0
4   4.0
5   5.0
6   NaN
7   1.0
8   1.0
9   2.0
10  3.0
11  4.0
12  5.0
13  NaN

Output:

      a
0   1.0
1   1.0
2   2.0
3   3.0
4   4.0
5   5.0
6   3.0
7   1.0
8   1.0
9   2.0
10  3.0
11  4.0
12  5.0
13  3.0
like image 26
Joe Avatar answered Sep 22 '22 12:09

Joe