Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to iterate over time periods in pandas

Tags:

python

pandas

Consider I have a pandas Series with a DatetimeIndex with daily frequency. I want to iterate over this Series with arbitrary frequency and an arbitrary look-back window. For example: Iterate half-yearly with a lookback window of 1y.

Something like this would be ideal:

for df_year in df.timegroup(freq='6m', lookback='1y'):
   # df_year will span one year of daily prices and be generated every 6 months

I know about TimeGrouper but haven't figured out how it could do this. Anyway, I could just code this manually but was hoping for a clever pandas one-liner.

Edit: This is getting a bit closer:

pd.rolling_apply(df, 252, lambda s: s.sum(), freq=pd.datetools.BMonthEnd())

This doesn't quite work, because it applies a lookback window of 252*BMonthEnd() while I'd like that to be independent and have a lookback window of 252 days every end of the month.

like image 800
twiecki Avatar asked Mar 17 '15 13:03

twiecki


People also ask

How do you iterate over time in python?

We can use the date_range() function method that is available in pandas. It is used to return a fixed frequency DatetimeIndex. We can iterate to get the date using date() function.

What is the fastest way to iterate over pandas DataFrame?

Vectorization is always the first and best choice. You can convert the data frame to NumPy array or into dictionary format to speed up the iteration workflow. Iterating through the key-value pair of dictionaries comes out to be the fastest way with around 280x times speed up for 20 million records.

How do you loop a date?

One way to loop through a date range with JavaScript is to use a while loop. We can create variables for the start and end dates. Then we can increment the start date until it reaches the end date. We have the start and end variables with the start and end date respectively.

Can you iterate through rows of a Pandas DataFrame?

Iterating over the rows of a DataFrameYou can do so using either iterrows() or itertuples() built-in methods.


2 Answers

I think this is what you are looking for

Construct a series of a frequency. Using 1 for clarify here.

In [77]: i = pd.date_range('20110101','20150101',freq='B')

In [78]: s = Series(1,index=i)

In [79]: s
Out[79]: 
2011-01-03    1
2011-01-04    1
2011-01-05    1
2011-01-06    1
2011-01-07    1
             ..
2014-12-26    1
2014-12-29    1
2014-12-30    1
2014-12-31    1
2015-01-01    1
Freq: B, dtype: int64

In [80]: len(s)
Out[80]: 1044

Conform the index to another frequency. This makes every index element be the end-of-month here.

In [81]: s.index = s.index.to_period('M').to_timestamp('M')

In [82]: s
Out[82]: 
2011-01-31    1
2011-01-31    1
2011-01-31    1
2011-01-31    1
2011-01-31    1
             ..
2014-12-31    1
2014-12-31    1
2014-12-31    1
2014-12-31    1
2015-01-31    1
dtype: int64

Then its straightforward to resample to another frequency. This gives you the number of business days in the period in this case.

In [83]: s.resample('3M',how='sum')
Out[83]: 
2011-01-31    21
2011-04-30    64
2011-07-31    65
2011-10-31    66
2012-01-31    66
              ..
2014-01-31    66
2014-04-30    63
2014-07-31    66
2014-10-31    66
2015-01-31    44
Freq: 3M, dtype: int64
like image 114
Jeff Avatar answered Oct 11 '22 14:10

Jeff


This solution provides a one liner using list comprehension. Starting from the left of the time series and iterating forward (backward iteration could also be done), the iteration returns a subset of the index equal to the loopback window and jumps in a step size equal to the frequency. Note that the very last period is likely a stub with a length less than the lookback window.

This method uses days rather than month or week offsets.

freq = 30      # Days
lookback = 60  # Days

idx = pd.date_range('2010-01-01', '2015-01-01')
[idx[(freq * n):(lookback + freq * n)] for n in range(int(len(idx) / freq))]

Out[86]: 
[<class 'pandas.tseries.index.DatetimeIndex'>
 [2010-01-01, ..., 2010-03-01]
 Length: 60, Freq: D, Timezone: None,
 <class 'pandas.tseries.index.DatetimeIndex'>
 [2010-01-31, ..., 2010-03-31]
 Length: 60, Freq: D, Timezone: None,
...
 Length: 60, Freq: D, Timezone: None,
 <class 'pandas.tseries.index.DatetimeIndex'>
 [2014-11-06, ..., 2015-01-01]
 Length: 57, Freq: D, Timezone: None]
like image 39
Alexander Avatar answered Oct 11 '22 16:10

Alexander