After fighting with NumPy and dateutil for days, I recently discovered the amazing Pandas library. I've been poring through the documentation and source code, but I can't figure out how to get date_range()
to generate indices at the right breakpoints.
from datetime import date import pandas as pd start = date('2012-01-15') end = date('2012-09-20') # 'M' is month-end, instead I need same-day-of-month date_range(start, end, freq='M')
What I want:
2012-01-15 2012-02-15 2012-03-15 ... 2012-09-15
What I get:
2012-01-31 2012-02-29 2012-03-31 ... 2012-08-31
I need month-sized chunks that account for the variable number of days in a month. This is possible with dateutil.rrule:
rrule(freq=MONTHLY, dtstart=start, bymonthday=(start.day, -1), bysetpos=1)
Ugly and illegible, but it works. How can do I this with pandas? I've played with both date_range()
and period_range()
, so far with no luck.
My actual goal is to use groupby
, crosstab
and/or resample
to calculate values for each period based on sums/means/etc of individual entries within the period. In other words, I want to transform data from:
total 2012-01-10 00:01 50 2012-01-15 01:01 55 2012-03-11 00:01 60 2012-04-28 00:01 80 #Hypothetical usage dataframe.resample('total', how='sum', freq='M', start='2012-01-09', end='2012-04-15')
to
total 2012-01-09 105 # Values summed 2012-02-09 0 # Missing from dataframe 2012-03-09 60 2012-04-09 0 # Data past end date, not counted
Given that Pandas originated as a financial analysis tool, I'm virtually certain that there's a simple and fast way to do this. Help appreciated!
Specifying the valuesSpecify start and end , with the default daily frequency. Specify start and periods , the number of periods (days). Specify end and periods , the number of periods (days). Specify start , end , and periods ; the frequency is generated automatically (linearly spaced).
Using Pandas to Create a List of Range of Dates in Python. We will use the date_range() function of pandas in which we will pass the start date and the number of days after that (known as periods). Here we have also used the datetime library to format the date so that we can output the date in this format DD-MM-YY .
pandas supports converting integer or float epoch times to Timestamp and DatetimeIndex . The default unit is nanoseconds, since that is how Timestamp objects are stored internally.
freq='M'
is for month-end frequencies (see here). But you can use .shift
to shift it by any number of days (or any frequency for that matter):
pd.date_range(start, end, freq='M').shift(15, freq=pd.datetools.day)
There actually is no "day of month" frequency (e.g. "DOMXX" like "DOM09"), but I don't see any reason not to add one.
http://github.com/pydata/pandas/issues/2289
I don't have a simple workaround for you at the moment because resample
requires passing a known frequency rule. I think it should be augmented to be able to take any date range to be used as arbitrary bin edges, also. Just a matter of time and hacking...
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With