How to apply a rolling Kalman Filter to a column in a DataFrame?

Tags:

pandas

kalman-filter

How to apply a rolling Kalman Filter to a DataFrame column (without using external data)?

That is, pretending that each row is a new point in time and therefore requires for the descriptive statistics to be updated (in a rolling manner) after each row.

For example, how to apply the Kalman Filter to any column in the below DataFrame?

n = 2000
index = pd.date_range(start='2000-01-01', periods=n)
data = np.random.randn(n, 4)
df = pd.DataFrame(data, columns=list('ABCD'), index=index)

I've seen previous responses (1 and 2) however they are not applying it to a DataFrame column (and they are not vectorized).

How to apply a rolling Kalman Filter to a column in a DataFrame?

841

asked Feb 12 '18 03:02

Greg

1 Answers

Exploiting some good features of numpy and using pykalman library, and applying the Kalman Filter on column D for a rolling window of 3, we can write:

import pandas as pd
from pykalman import KalmanFilter
import numpy as np

def rolling_window(a, step):
    shape   = a.shape[:-1] + (a.shape[-1] - step + 1, step)
    strides = a.strides + (a.strides[-1],)
    return np.lib.stride_tricks.as_strided(a, shape=shape, strides=strides)

def get_kf_value(y_values):
    kf = KalmanFilter()
    Kc, Ke = kf.em(y_values, n_iter=1).smooth(0)
    return Kc

n = 2000
index = pd.date_range(start='2000-01-01', periods=n)
data = np.random.randn(n, 4)
df = pd.DataFrame(data, columns=list('ABCD'), index=index)

wsize = 3
arr = rolling_window(df.D.values, wsize)
zero_padding = np.zeros(shape=(wsize-1,wsize))
arrst = np.concatenate((zero_padding, arr))
arrkalman = np.zeros(shape=(len(arrst),1))

for i in range(len(arrst)):
    arrkalman[i] = get_kf_value(arrst[i])

kalmandf = pd.DataFrame(arrkalman, columns=['D_kalman'], index=index)
df = pd.concat([df,kalmandf], axis=1)

df.head() should yield something like this:

                   A         B         C         D  D_kalman
2000-01-01 -0.003156 -1.487031 -1.755621 -0.101233  0.000000
2000-01-02  0.172688 -0.767011 -0.965404 -0.131504  0.000000
2000-01-03 -0.025983 -0.388501 -0.904286  1.062163  0.013633
2000-01-04 -0.846606 -0.576383 -1.066489 -0.041979  0.068792
2000-01-05 -1.505048  0.498062  0.619800  0.012850  0.252550

answered Jan 04 '23 00:01

Gursel Karacor

Related questions
                            
                                Regex named groups in R
                            
                                Find closest line to each point on big dataset, possibly using shapely and rtree
                            
                                Pandas Python Groupby Cummulative Sum Reverse
                            
                                Using transform to add a count of duplicate rows on certain columns - Pandas
                            
                                Bokeh FixedTicker with Custom Datetime/Timestamp values
                            
                                Reverse Label Encoding giving error
                            
                                Randomly select unique row from dataframe in Pandas
                            
                                Connecting to SQL server from SQLAlchemy using odbc_connect
                            
                                How to make a slice of DataFrame and "fillna" in specific slice using Python Pandas?
                            
                                Python Pandas: calculate rolling mean (moving average) over variable number of rows
                            
                                Pandas.plotting doesn't show graph
                            
                                Get the column names of a python numpy array
                            
                                Treating NaN as zero in arithmetic operations?
                            
                                save pandas plot with subplots to one file
                            
                                Creating a pivot table in pandas and grouping at the same time the dates per week
                            
                                Heatmap in python to represent (x,y) coordinates in a given rectangular area
                            
                                Writing single CSV header with pandas
                            
                                Replace dataframe column negative values with nan, in method chain
                            
                                How to divide all rows in a panda Dataframe except for one specific row?
                            
                                Using groupby with expanding and a custom function

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to apply a rolling Kalman Filter to a column in a DataFrame?

Tags:

pandas

kalman-filter

Greg

People also ask

1 Answers

Gursel Karacor

Recent Activity

Donate For Us