Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas rolling returns NaN when infinity values are involved

When using rolling on a series that contains inf values the result contains NaN even if the operation is well defined, like min or max. For example:

import numpy as np
import pandas as pd

s = pd.Series([1, 2, 3, np.inf, 5, 6])
print(s.rolling(window=3).min())

This gives:

0    NaN
1    NaN
2    1.0
3    NaN
4    NaN
5    NaN
dtype: float64

while I expected

0    NaN
1    NaN
2    1.0
3    2.0
4    3.0
5    5.0

Computing the minimum of the series directly works as expected:

s.min()  # 1.0

What is the reason for additional NaN values being introduced?


Python 3.8.1, pandas 1.0.2

like image 665
a_guest Avatar asked Mar 19 '20 22:03

a_guest


People also ask

How do pandas deal with infinite values?

Using pandas replace() & dropna() To Drop Infinite Values Use df. replace() to replace entire infinite values with np. nan and use pd. DataFrame.

How do you check infinity in pandas?

Method 1: Use DataFrame. isinf() function to check whether the dataframe contains infinity or not. It returns boolean value. If it contains any infinity, it will return True.

How does rolling work in pandas?

Window Rolling Mean (Moving Average)The moving average calculation creates an updated average value for each row based on the window we specify. The calculation is also called a “rolling mean” because it's calculating an average of values within a specified range for each row as you go along the DataFrame.

How does pandas mean deal with NaN?

pandas mean() Key PointsBy default ignore NaN values and performs mean on index axis. Provides a way to calculate mean on column axis.


1 Answers

np.inf is explicitly converted to np.NaN in pandas/core/window/rolling.py

# Convert inf to nan for C funcs
inf = np.isinf(values)
if inf.any():
    values = np.where(inf, np.nan, values)

How to represent inf or -inf in Cython with numpy? gives information on why they had to do this.


You'd find the exact same behavior if you had NaN instead of np.inf. It can be difficult to get your output because min_counts will throw away those intermediate groups because they lack sufficient observations. One clean "hack" is to replace inf with the biggest value you can, which should be rather safe taking 'min'.

import numpy as np
s.replace(np.inf, np.finfo('float64').max).rolling(3).min()

#0    NaN
#1    NaN
#2    1.0
#3    2.0
#4    3.0
#5    5.0
#dtype: float64
like image 100
ALollz Avatar answered Nov 12 '22 18:11

ALollz