If it's single row, I can get the iterator as following
import pandas as pd
import numpy as np
a = np.zeros((100,40))
X = pd.DataFrame(a)
for index, row in X.iterrows():
print index
print row
Now I want each iterator will return a subset X[0:9, :]
, X[5:14, :]
, X[10:19, :]
etc. How do I achieve this with rolling (pandas.DataFrame.rolling
)?
Window Rolling Mean (Moving Average)The moving average calculation creates an updated average value for each row based on the window we specify. The calculation is also called a “rolling mean” because it's calculating an average of values within a specified range for each row as you go along the DataFrame.
Itertuples() iterates through the data frame by converting each row of data as a list of tuples. itertuples() takes 16 seconds to iterate through a data frame with 10 million records that are around 50x times faster than iterrows().
By using apply and specifying one as the axis, we can run a function on every row of a dataframe. This solution also uses looping to get the job done, but apply has been optimized better than iterrows , which results in faster runtimes.
I'll experiment with the following dataframe.
import pandas as pd
import numpy as np
from string import uppercase
def generic_portfolio_df(start, end, freq, num_port, num_sec, seed=314):
np.random.seed(seed)
portfolios = pd.Index(['Portfolio {}'.format(i) for i in uppercase[:num_port]],
name='Portfolio')
securities = ['s{:02d}'.format(i) for i in range(num_sec)]
dates = pd.date_range(start, end, freq=freq)
return pd.DataFrame(np.random.rand(len(dates) * num_sec, num_port),
index=pd.MultiIndex.from_product([dates, securities],
names=['Date', 'Id']),
columns=portfolios
).groupby(level=0).apply(lambda x: x / x.sum())
df = generic_portfolio_df('2014-12-31', '2015-05-30', 'BM', 3, 5)
df.head(10)
I'll now introduce a function to roll a number of rows and concatenate into a single dataframe where I'll add a top level to the column index that indicates the location in the roll.
def rolled(df, n):
k = range(df.columns.nlevels)
_k = [i - len(k) for i in k]
myroll = pd.concat([df.shift(i).stack(level=k) for i in range(n)],
axis=1, keys=range(n)).unstack(level=_k)
return [(i, row.unstack(0)) for i, row in myroll.iterrows()]
Though its hidden in the function, myroll
would look like this
Now we can use it just like an iterator.
for i, roll in rolled(df.head(5), 3):
print roll
print
0 1 2
Portfolio
Portfolio A 0.326164 NaN NaN
Portfolio B 0.201597 NaN NaN
Portfolio C 0.085340 NaN NaN
0 1 2
Portfolio
Portfolio A 0.278614 0.326164 NaN
Portfolio B 0.314448 0.201597 NaN
Portfolio C 0.266392 0.085340 NaN
0 1 2
Portfolio
Portfolio A 0.258958 0.278614 0.326164
Portfolio B 0.089224 0.314448 0.201597
Portfolio C 0.293570 0.266392 0.085340
0 1 2
Portfolio
Portfolio A 0.092760 0.258958 0.278614
Portfolio B 0.262511 0.089224 0.314448
Portfolio C 0.084208 0.293570 0.266392
0 1 2
Portfolio
Portfolio A 0.043503 0.092760 0.258958
Portfolio B 0.132221 0.262511 0.089224
Portfolio C 0.270490 0.084208 0.293570
That's not how rolling works. It "provides rolling transformations" (from the docs).
You can loop and use pandas indexing?
for i in range((X.shape[0] + 9) // 10):
X_subset = X.iloc[i * 10: (i + 1) * 10])
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With