Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Date ranges in Pandas

Tags:

After fighting with NumPy and dateutil for days, I recently discovered the amazing Pandas library. I've been poring through the documentation and source code, but I can't figure out how to get date_range() to generate indices at the right breakpoints.

from datetime import date import pandas as pd  start = date('2012-01-15') end = date('2012-09-20') # 'M' is month-end, instead I need same-day-of-month date_range(start, end, freq='M') 

What I want:

2012-01-15 2012-02-15 2012-03-15 ... 2012-09-15 

What I get:

2012-01-31 2012-02-29 2012-03-31 ... 2012-08-31 

I need month-sized chunks that account for the variable number of days in a month. This is possible with dateutil.rrule:

rrule(freq=MONTHLY, dtstart=start, bymonthday=(start.day, -1), bysetpos=1) 

Ugly and illegible, but it works. How can do I this with pandas? I've played with both date_range() and period_range(), so far with no luck.

My actual goal is to use groupby, crosstab and/or resample to calculate values for each period based on sums/means/etc of individual entries within the period. In other words, I want to transform data from:

                total 2012-01-10 00:01    50 2012-01-15 01:01    55 2012-03-11 00:01    60 2012-04-28 00:01    80  #Hypothetical usage dataframe.resample('total', how='sum', freq='M', start='2012-01-09', end='2012-04-15')  

to

                total 2012-01-09          105 # Values summed 2012-02-09          0   # Missing from dataframe 2012-03-09          60 2012-04-09          0   # Data past end date, not counted 

Given that Pandas originated as a financial analysis tool, I'm virtually certain that there's a simple and fast way to do this. Help appreciated!

like image 999
knite Avatar asked Nov 18 '12 22:11

knite


People also ask

How do you create a date range in pandas?

Specifying the valuesSpecify start and end , with the default daily frequency. Specify start and periods , the number of periods (days). Specify end and periods , the number of periods (days). Specify start , end , and periods ; the frequency is generated automatically (linearly spaced).

How do you create a date range in Python?

Using Pandas to Create a List of Range of Dates in Python. We will use the date_range() function of pandas in which we will pass the start date and the number of days after that (known as periods). Here we have also used the datetime library to format the date so that we can output the date in this format DD-MM-YY .

Is there a date type in pandas?

pandas supports converting integer or float epoch times to Timestamp and DatetimeIndex . The default unit is nanoseconds, since that is how Timestamp objects are stored internally.


2 Answers

freq='M' is for month-end frequencies (see here). But you can use .shift to shift it by any number of days (or any frequency for that matter):

pd.date_range(start, end, freq='M').shift(15, freq=pd.datetools.day) 
like image 89
Matti John Avatar answered Dec 10 '22 10:12

Matti John


There actually is no "day of month" frequency (e.g. "DOMXX" like "DOM09"), but I don't see any reason not to add one.

http://github.com/pydata/pandas/issues/2289

I don't have a simple workaround for you at the moment because resample requires passing a known frequency rule. I think it should be augmented to be able to take any date range to be used as arbitrary bin edges, also. Just a matter of time and hacking...

like image 34
Wes McKinney Avatar answered Dec 10 '22 11:12

Wes McKinney