Consider I have a pandas Series
with a DatetimeIndex
with daily frequency. I want to iterate over this Series
with arbitrary frequency and an arbitrary look-back window. For example: Iterate half-yearly with a lookback window of 1y.
Something like this would be ideal:
for df_year in df.timegroup(freq='6m', lookback='1y'):
# df_year will span one year of daily prices and be generated every 6 months
I know about TimeGrouper
but haven't figured out how it could do this. Anyway, I could just code this manually but was hoping for a clever pandas
one-liner.
Edit: This is getting a bit closer:
pd.rolling_apply(df, 252, lambda s: s.sum(), freq=pd.datetools.BMonthEnd())
This doesn't quite work, because it applies a lookback window of 252*BMonthEnd() while I'd like that to be independent and have a lookback window of 252 days every end of the month.
We can use the date_range() function method that is available in pandas. It is used to return a fixed frequency DatetimeIndex. We can iterate to get the date using date() function.
Vectorization is always the first and best choice. You can convert the data frame to NumPy array or into dictionary format to speed up the iteration workflow. Iterating through the key-value pair of dictionaries comes out to be the fastest way with around 280x times speed up for 20 million records.
One way to loop through a date range with JavaScript is to use a while loop. We can create variables for the start and end dates. Then we can increment the start date until it reaches the end date. We have the start and end variables with the start and end date respectively.
Iterating over the rows of a DataFrameYou can do so using either iterrows() or itertuples() built-in methods.
I think this is what you are looking for
Construct a series of a frequency. Using 1 for clarify here.
In [77]: i = pd.date_range('20110101','20150101',freq='B')
In [78]: s = Series(1,index=i)
In [79]: s
Out[79]:
2011-01-03 1
2011-01-04 1
2011-01-05 1
2011-01-06 1
2011-01-07 1
..
2014-12-26 1
2014-12-29 1
2014-12-30 1
2014-12-31 1
2015-01-01 1
Freq: B, dtype: int64
In [80]: len(s)
Out[80]: 1044
Conform the index to another frequency. This makes every index element be the end-of-month here.
In [81]: s.index = s.index.to_period('M').to_timestamp('M')
In [82]: s
Out[82]:
2011-01-31 1
2011-01-31 1
2011-01-31 1
2011-01-31 1
2011-01-31 1
..
2014-12-31 1
2014-12-31 1
2014-12-31 1
2014-12-31 1
2015-01-31 1
dtype: int64
Then its straightforward to resample to another frequency. This gives you the number of business days in the period in this case.
In [83]: s.resample('3M',how='sum')
Out[83]:
2011-01-31 21
2011-04-30 64
2011-07-31 65
2011-10-31 66
2012-01-31 66
..
2014-01-31 66
2014-04-30 63
2014-07-31 66
2014-10-31 66
2015-01-31 44
Freq: 3M, dtype: int64
This solution provides a one liner using list comprehension. Starting from the left of the time series and iterating forward (backward iteration could also be done), the iteration returns a subset of the index equal to the loopback window and jumps in a step size equal to the frequency. Note that the very last period is likely a stub with a length less than the lookback window.
This method uses days rather than month or week offsets.
freq = 30 # Days
lookback = 60 # Days
idx = pd.date_range('2010-01-01', '2015-01-01')
[idx[(freq * n):(lookback + freq * n)] for n in range(int(len(idx) / freq))]
Out[86]:
[<class 'pandas.tseries.index.DatetimeIndex'>
[2010-01-01, ..., 2010-03-01]
Length: 60, Freq: D, Timezone: None,
<class 'pandas.tseries.index.DatetimeIndex'>
[2010-01-31, ..., 2010-03-31]
Length: 60, Freq: D, Timezone: None,
...
Length: 60, Freq: D, Timezone: None,
<class 'pandas.tseries.index.DatetimeIndex'>
[2014-11-06, ..., 2015-01-01]
Length: 57, Freq: D, Timezone: None]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With