I have a large pandas dataframe of time-series data.
I currently manipulate this dataframe to create a new, smaller dataframe that is rolling average of every 10 rows. i.e. a rolling window technique. Like this:
def create_new_df(df):
features = []
x = df['X'].astype(float)
i = x.index.values
time_sequence = [i] * 10
idx = np.array(time_sequence).T.flatten()[:len(x)]
x = x.groupby(idx).mean()
x.name = 'X'
features.append(x)
new_df = pd.concat(features, axis=1)
return new_df
Code to test:
columns = ['X']
df_ = pd.DataFrame(columns=columns)
df_ = df_.fillna(0) # with 0s rather than NaNs
data = np.array([np.arange(20)]*1).T
df = pd.DataFrame(data, columns=columns)
test = create_new_df(df)
print test
Output:
X
0 4.5
1 14.5
However, I want the function to make the new dataframe using a sliding window with a 50% overlap
So the output would look like this:
X
0 4.5
1 9.5
2 14.5
How can I do this?
Here's what I've tried:
from itertools import tee, izip
def window(iterable, size):
iters = tee(iterable, size)
for i in xrange(1, size):
for each in iters[i:]:
next(each, None)
return izip(*iters)
for each in window(df, 20):
print list(each) # doesn't have the desired sliding window effect
Some might also suggest using the pandas rolling_mean() methods, but if so, I can't see how to use this function with window overlap.
Any help would be much appreciated.
rolling() function provides the feature of rolling window calculations. The concept of rolling window calculation is most primarily used in signal processing and time-series data. In very simple words we take a window size of k at a time and perform some desired mathematical operation on it.
Rolling window calculations in PandasThe rolling() function is used to provide rolling window calculations. Syntax: Series.rolling(self, window, min_periods=None, center=False, win_type=None, on=None, axis=0, closed=None) Parameters: Name.
Two types of data window–Rolling and Static A Rolling window is expressed relative to the delivery date and automatically shifts forward with the passage of time.
Window functions allow us to perform an operation with a given row's data and data from another row that is a specified number of rows away — this “number of rows away value” is called the window.
I think pandas rolling techniques are fine here. Note that starting with version 0.18.0 of pandas, you would use rolling().mean()
instead of rolling_mean()
.
>>> df=pd.DataFrame({ 'x':range(30) })
>>> df = df.rolling(10).mean() # version 0.18.0 syntax
>>> df[4::5] # take every 5th row
x
4 NaN
9 4.5
14 9.5
19 14.5
24 19.5
29 24.5
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With