Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python / Pandas / Numpy - Direct calculation of number of business days between two dates excluding holidays

Is there a better / more direct way to calculate this than the following?

# 1. Set up the start and end date for which you want to calculate the      
# number of business days excluding holidays.

start_date = '01JAN1986'
end_date = '31DEC1987'
start_date = datetime.datetime.strptime(start_date, '%d%b%Y')
end_date = datetime.datetime.strptime(end_date, '%d%b%Y')

# 2. Generate a list of holidays over this period
from pandas.tseries.holiday import USFederalHolidayCalendar
calendar = USFederalHolidayCalendar()
holidays = calendar.holidays(start_date, end_date)
holidays

Which gives a pandas.tseries.index.DatetimeIndex

DatetimeIndex(['1986-01-01', '1986-01-20', '1986-02-17', '1986-05-26',
           '1986-07-04', '1986-09-01', '1986-10-13', '1986-11-11',
           '1986-11-27', '1986-12-25', '1987-01-01', '1987-01-19',
           '1987-02-16', '1987-05-25', '1987-07-03', '1987-09-07',
           '1987-10-12', '1987-11-11', '1987-11-26', '1987-12-25'],
          dtype='datetime64[ns]', freq=None, tz=None)

But you need a list for numpy busday_count

holiday_date_list = holidays.date.tolist()

Then with and without the holidays you get:

np.busday_count(start_date.date(), end_date.date()) 
>>> 521

np.busday_count(start_date.date(), end_date.date(), holidays = holiday_date_list)
>>> 501

There are some other questions slightly similar but generally working with pandas Series or Dataframes (Get business days between start and end date using pandas, Counting the business days between two series)

like image 755
InterwebIsGreat Avatar asked May 15 '15 17:05

InterwebIsGreat


1 Answers

If you put the index you created in a dataframe, you can use resample to fill in the gaps. The offset passed to .resample() can include things like business days and even (custom) calendars:

from pandas.tseries.holiday import USFederalHolidayCalendar

C = pd.offsets.CustomBusinessDay(calendar=USFederalHolidayCalendar())

start_date = '01JAN1986'
end_date = '31DEC1987'

(
pd.DataFrame(index=pd.to_datetime([start_date, end_date]))
    .resample(C, closed='right') 
    .asfreq()
    .index  
    .size
) - 1

The size of the index - 1 then gives us the amount of days.

like image 177
jorijnsmit Avatar answered Oct 20 '22 19:10

jorijnsmit