Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to generate rolling subsequences into a dataframe in Python

I have the following data frame:

col0
1
2
3
4
5
...
1000

I'd like roll col0 into a data frame with a window size of 5, so the outcome would be like this:

col0 col1 col2 ... col995
1 2 3 ... 996
2 3 4 ... 997
3 4 5 ... 998
4 5 6 ... 999
5 6 7 ... 1000

I've tried using loops and "iloc" which would produce correct results, but as the original data frame gets much longer, it would take too long to finish. To complete 10,000, it'd take almost 2 minutes, 20,000 almost 10 minutes, and so on... Is there any way to do it faster, more efficiently in Python?

like image 426
ntintel Avatar asked Mar 07 '26 18:03

ntintel


2 Answers

Use numpy.lib.stride_tricks.sliding_window_view and transpose (T):

from numpy.lib.stride_tricks import sliding_window_view as swv

out = pd.DataFrame(swv(df['col0'], 5).T).add_prefix('col')

Output:

   col0  col1  col2  col3  col4  col5  col6  col7  col8  col9  ...  col986  \
0     1     2     3     4     5     6     7     8     9    10  ...     987   
1     2     3     4     5     6     7     8     9    10    11  ...     988   
2     3     4     5     6     7     8     9    10    11    12  ...     989   
3     4     5     6     7     8     9    10    11    12    13  ...     990   
4     5     6     7     8     9    10    11    12    13    14  ...     991   

   col987  col988  col989  col990  col991  col992  col993  col994  col995  
0     988     989     990     991     992     993     994     995     996  
1     989     990     991     992     993     994     995     996     997  
2     990     991     992     993     994     995     996     997     998  
3     991     992     993     994     995     996     997     998     999  
4     992     993     994     995     996     997     998     999    1000  

[5 rows x 996 columns]

Reproducible input:

N = 1000
df = pd.DataFrame({'col0': range(1, N+1)})

Timing for 100k rows:

25 ms ± 402 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
like image 109
mozway Avatar answered Mar 10 '26 08:03

mozway


You can use sliding_window_view for this:

import pandas as pd
from numpy.lib.stride_tricks import sliding_window_view

df = pd.DataFrame({'Col0': range(1,1001)})

data = sliding_window_view(df['Col0'], 5).T

df_new = pd.DataFrame(data, 
                      columns=[f'Col{i}' for i in range(data.shape[1])])

print(df_new)

   Col0  Col1  Col2  Col3  Col4  ...  Col991  Col992  Col993  Col994  Col995
0     1     2     3     4     5  ...     992     993     994     995     996
1     2     3     4     5     6  ...     993     994     995     996     997
2     3     4     5     6     7  ...     994     995     996     997     998
3     4     5     6     7     8  ...     995     996     997     998     999
4     5     6     7     8     9  ...     996     997     998     999    1000

[5 rows x 996 columns]
like image 20
ouroboros1 Avatar answered Mar 10 '26 08:03

ouroboros1



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!