If it's single row, I can get the iterator as following
import pandas as pd
import numpy as np
a = np.zeros((100,40))
X = pd.DataFrame(a)
for index, row in X.iterrows():
    print index
    print row
Now I want each iterator will return a subset X[0:9, :], X[5:14, :], X[10:19, :] etc. How do I achieve this with rolling (pandas.DataFrame.rolling)?
Window Rolling Mean (Moving Average)The moving average calculation creates an updated average value for each row based on the window we specify. The calculation is also called a “rolling mean” because it's calculating an average of values within a specified range for each row as you go along the DataFrame.
Itertuples() iterates through the data frame by converting each row of data as a list of tuples. itertuples() takes 16 seconds to iterate through a data frame with 10 million records that are around 50x times faster than iterrows().
By using apply and specifying one as the axis, we can run a function on every row of a dataframe. This solution also uses looping to get the job done, but apply has been optimized better than iterrows , which results in faster runtimes.
I'll experiment with the following dataframe.
import pandas as pd
import numpy as np
from string import uppercase
def generic_portfolio_df(start, end, freq, num_port, num_sec, seed=314):
    np.random.seed(seed)
    portfolios = pd.Index(['Portfolio {}'.format(i) for i in uppercase[:num_port]],
                          name='Portfolio')
    securities = ['s{:02d}'.format(i) for i in range(num_sec)]
    dates = pd.date_range(start, end, freq=freq)
    return pd.DataFrame(np.random.rand(len(dates) * num_sec, num_port),
                        index=pd.MultiIndex.from_product([dates, securities],
                                                         names=['Date', 'Id']),
                        columns=portfolios
                       ).groupby(level=0).apply(lambda x: x / x.sum())    
df = generic_portfolio_df('2014-12-31', '2015-05-30', 'BM', 3, 5)
df.head(10)

I'll now introduce a function to roll a number of rows and concatenate into a single dataframe where I'll add a top level to the column index that indicates the location in the roll.
def rolled(df, n):
    k = range(df.columns.nlevels)
    _k = [i - len(k) for i in k]
    myroll = pd.concat([df.shift(i).stack(level=k) for i in range(n)],
                       axis=1, keys=range(n)).unstack(level=_k)
    return [(i, row.unstack(0)) for i, row in myroll.iterrows()]
Though its hidden in the function, myroll would look like this

Now we can use it just like an iterator.
for i, roll in rolled(df.head(5), 3):
    print roll
    print
                    0   1   2
Portfolio                    
Portfolio A  0.326164 NaN NaN
Portfolio B  0.201597 NaN NaN
Portfolio C  0.085340 NaN NaN
                    0         1   2
Portfolio                          
Portfolio A  0.278614  0.326164 NaN
Portfolio B  0.314448  0.201597 NaN
Portfolio C  0.266392  0.085340 NaN
                    0         1         2
Portfolio                                
Portfolio A  0.258958  0.278614  0.326164
Portfolio B  0.089224  0.314448  0.201597
Portfolio C  0.293570  0.266392  0.085340
                    0         1         2
Portfolio                                
Portfolio A  0.092760  0.258958  0.278614
Portfolio B  0.262511  0.089224  0.314448
Portfolio C  0.084208  0.293570  0.266392
                    0         1         2
Portfolio                                
Portfolio A  0.043503  0.092760  0.258958
Portfolio B  0.132221  0.262511  0.089224
Portfolio C  0.270490  0.084208  0.293570
                        That's not how rolling works. It "provides rolling transformations" (from the docs).
You can loop and use pandas indexing?
for i in range((X.shape[0] + 9) // 10):
    X_subset = X.iloc[i * 10: (i + 1) * 10])
                        If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With