Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Sliding Window over Pandas Dataframe

I have a large pandas dataframe of time-series data.

I currently manipulate this dataframe to create a new, smaller dataframe that is rolling average of every 10 rows. i.e. a rolling window technique. Like this:

def create_new_df(df):
    features = []
    x = df['X'].astype(float)
    i = x.index.values
    time_sequence = [i] * 10
    idx = np.array(time_sequence).T.flatten()[:len(x)]
    x = x.groupby(idx).mean()
    x.name = 'X'
    features.append(x)
    new_df = pd.concat(features, axis=1)
    return new_df

Code to test:

columns = ['X']
df_ = pd.DataFrame(columns=columns)
df_ = df_.fillna(0) # with 0s rather than NaNs
data = np.array([np.arange(20)]*1).T
df = pd.DataFrame(data, columns=columns)

test = create_new_df(df)
print test

Output:

      X
0   4.5
1  14.5

However, I want the function to make the new dataframe using a sliding window with a 50% overlap

So the output would look like this:

      X
0   4.5
1   9.5
2  14.5

How can I do this?

Here's what I've tried:

from itertools import tee, izip

def window(iterable, size):
    iters = tee(iterable, size)
    for i in xrange(1, size):
        for each in iters[i:]:
            next(each, None)
    return izip(*iters)

for each in window(df, 20):
    print list(each) # doesn't have the desired sliding window effect

Some might also suggest using the pandas rolling_mean() methods, but if so, I can't see how to use this function with window overlap.

Any help would be much appreciated.

like image 846
cs_stackX Avatar asked Apr 29 '16 12:04

cs_stackX


People also ask

What does DF Rolling do?

rolling() function provides the feature of rolling window calculations. The concept of rolling window calculation is most primarily used in signal processing and time-series data. In very simple words we take a window size of k at a time and perform some desired mathematical operation on it.

What is window in rolling pandas?

Rolling window calculations in PandasThe rolling() function is used to provide rolling window calculations. Syntax: Series.rolling(self, window, min_periods=None, center=False, win_type=None, on=None, axis=0, closed=None) Parameters: Name.

What is a rolling window?

Two types of data window–Rolling and Static A Rolling window is expressed relative to the delivery date and automatically shifts forward with the passage of time.

What is a window in a DataFrame?

Window functions allow us to perform an operation with a given row's data and data from another row that is a specified number of rows away — this “number of rows away value” is called the window.


1 Answers

I think pandas rolling techniques are fine here. Note that starting with version 0.18.0 of pandas, you would use rolling().mean() instead of rolling_mean().

>>> df=pd.DataFrame({ 'x':range(30) })
>>> df = df.rolling(10).mean()           # version 0.18.0 syntax
>>> df[4::5]                             # take every 5th row

       x
4    NaN
9    4.5
14   9.5
19  14.5
24  19.5
29  24.5
like image 68
JohnE Avatar answered Sep 18 '22 09:09

JohnE