Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

performance of pandas custom business day offset

For a ton of dates, I need to compute the next business day, where I account for holidays.

Currently, I'm using something like the code below, which I've pasted from IPython notebook:

import pandas as pd
from pandas.tseries.holiday import USFederalHolidayCalendar

cal = USFederalHolidayCalendar()
bday_offset = lambda n: pd.datetools.offsets.CustomBusinessDay(n, calendar=cal)

mydate = pd.to_datetime("12/24/2014")
%timeit with_holiday = mydate + bday_offset(1)
%timeit without_holiday = mydate + pd.datetools.offsets.BDay(1)

On my computer, the with_holiday line runs in ~12 milliseconds; and the without_holiday line runs in ~15 microseconds.

Is there any way to make the bday_offset function faster?

like image 501
hahdawg Avatar asked Jul 20 '15 18:07

hahdawg


People also ask

Does Pandas business day include holidays?

For pandas. date_range the days returned seem to simply only be the weekdays, i.e. Monday through Friday, which include any holidays. If you want to exclude holidays in your python version you can use any of the existing calendar classes or create a custom one yourself in combination with pandas.

How do Pandas deal with dates?

Pandas has a built-in function called to_datetime()that converts date and time in string format to a DateTime object. As you can see, the 'date' column in the DataFrame is currently of a string-type object. Thus, to_datetime() converts the column to a series of the appropriate datetime64 dtype.

What are Pandas business days?

Dateoffsets are a standard kind of date increment used for a date range in Pandas. It works exactly like relativedelta in terms of the keyword args we pass in.


1 Answers

I think the way you are implementing it via lambda is slowing it down. Consider this method (taken more or less straight from the documentaion )

from pandas.tseries.offsets import CustomBusinessDay
bday_us = CustomBusinessDay(calendar=USFederalHolidayCalendar())
mydate + bday_us

Out[13]: Timestamp('2014-12-26 00:00:00')

The first part is slow, but you only need to do it once. The second part is very fast though.

%timeit bday_us = CustomBusinessDay(calendar=USFederalHolidayCalendar())
10 loops, best of 3: 66.5 ms per loop

%timeit mydate + bday_us
10000 loops, best of 3: 44 µs per loop

To get apples to apples, here are the other timings on my machine:

%timeit with_holiday = mydate + bday_offset(1)
10 loops, best of 3: 23.1 ms per loop

%timeit without_holiday = mydate + pd.datetools.offsets.BDay(1)
10000 loops, best of 3: 36.6 µs per loop
like image 154
JohnE Avatar answered Sep 28 '22 11:09

JohnE