I need to write a function that will count the number of working days (minus weekends, and a vector of other local bank holidays), but the problem I'm coming up against is more simply illustrated with just counting the number of weekdays.
Here is a function that will give the number of weekdays between two dates:
removeWeekends <- function(end, start){
range <- as.Date(start:end, "1970-01-01")
range<- range[sapply(range, function(x){
if(!chron::is.weekend(x)){
return(TRUE)
}else{
return(FALSE)
}
})]
return(NROW(range))
}
Which works when it is given a single date for each argument:
removeWeekends(as.Date("2018-05-08"), as.Date("2018-06-08"))
#[1] 24
But when it is given a two vectors from a data frame it fails:
one <- as.Date("2017-01-01"):as.Date("2017-01-08")
two <- as.Date("2018-06-08"):as.Date("2018-06-15")
df <- data.frame(one, two)
removeWeekends(df$two, df$one)
#[1] 375
#Warning messages:
#1: In start:end : numerical expression has 8 elements: only the first used
#2: In start:end : numerical expression has 8 elements: only the first used
I've also tried (which I guessed would not work as the syntax seems off):
lapply(df, removeWeekends, df$two, df$one)
#Error in FUN(X[[i]], ...) : unused argument (17167:17174)
And:
lapply(df[,c("two", "one")], removeWeekends)
#Error in as.Date(start:end, "1970-01-01") : argument "start" is missing,
# with no default
I'm assuming it is me misunderstanding the concept of vectorization.
The only other idea I've got is nesting the function within a conditional to see if it's a vector, then calling an apply function on it if it is although I'm not quite sure how I would structure that either.
Most of R's functions are vectorized, meaning that the function will operate on all elements of a vector without needing to loop through and act on each element one at a time. This makes writing code more concise, easy to read, and less error prone.
The vectorize function is provided primarily for convenience, not for performance. The implementation is essentially a for loop. If otypes is not specified, then a call to the function with the first argument will be used to determine the number of outputs.
The basic syntax for a custom R function is FunctionName = function(Argument(s)) {Statement(s)} . All functions are assigned a name FunctionName; they end up as objects in your workspace, and are implemented by name. Argument(s) represented the input data objects, which can range for one to several.
You have couple of options to support vectorized
argument in function. Since, you have already written your function, the easiest option would be to use Vectorize
and convert your function to support vectorized arguments. Another, option is to modify your function and re-write it to support vectorized arguments.
Option#1: Using Vectorize
# Function will support vectorized argument with single statement
vremoveWeekends <- Vectorize(removeWeekends)
# Try vremoveWeekends function
df$dayswithoutweekends <- vremoveWeekends(df$two, df$one)
Option#2: Re-write function to support vectorized arguments. I'll prefer this option since, OP got two arguments which are expected to be of same length. Hence, it will be easier to perform error checking on arguments if we re-write it.
# Modified function
removeWeekendsNew <- function(end, start){
if(length(start) != length(end)){
return(0L) #Error condition
}
result <- rep(0L, length(start)) #store the result for each row
#One can use mapply instead of for-loop. But for-loop will be faster
for(i in seq_along(start)){
range = seq(start[i], end[i], by="day")
result[i] = length(range[!chron::is.weekend(range)])
}
return(result)
}
#Use new function:
df$dayswithoutweekends <- removeWeekendsNew(df$two, df$one)
Result: It's same for both options mentioned above.
df
# one two dayswithoutweekends
# 1 2017-01-01 2018-06-08 375
# 2 2017-01-02 2018-06-09 375
# 3 2017-01-03 2018-06-10 374
# 4 2017-01-04 2018-06-11 374
# 5 2017-01-05 2018-06-12 374
# 6 2017-01-06 2018-06-13 374
# 7 2017-01-07 2018-06-14 374
# 8 2017-01-08 2018-06-15 375
Data:
one <- seq(as.Date("2017-01-01"),as.Date("2017-01-08"), by="day")
two <- seq(as.Date("2018-06-08"),as.Date("2018-06-15"), by="day")
df <- data.frame(one, two)
df
# one two
# 1 2017-01-01 2018-06-08
# 2 2017-01-02 2018-06-09
# 3 2017-01-03 2018-06-10
# 4 2017-01-04 2018-06-11
# 5 2017-01-05 2018-06-12
# 6 2017-01-06 2018-06-13
# 7 2017-01-07 2018-06-14
# 8 2017-01-08 2018-06-15
If you want to fully vectorize this, you will need to think out of the box. What chron::is.weekend
does is just checking how many days were Sundays and Saturdays in a certain time preiod. We can calculate this ourselves in a vectorized way because each week has two weekends, and the only tricky part are the left overs.
I wrote the following function to achieve this, though I'm sure it could be improved
frw <- function(two, one) {
diff_d <- two - one ## difference in days
l_d <- (two + 4L) %% 7L + 1L ## last day of the remainder
weeks <- diff_d %/% 7L ## number of weeks between
days <- diff_d %% 7L ## days left
## calculate how many work days left
diff_d -
((weeks * 2L) + ((l_d - days < 1) + ((l_d - days < 2) - (l_d == 1L))) +
(l_d %in% c(1L, 7L))) + 1L
}
You can run it as follows
frw(two, one)
## [1] 375 375 374 374 374 374 374 375
It is by far faster than the mapply
version (almost instant), some benchmark on a bigger data:
one <- as.Date("2017-01-01"):as.Date("2030-01-08")
two <- as.Date("2017-05-01"):as.Date("2030-05-08")
df <- data.frame(one, two)
system.time(res_mapply <- vremoveWeekends(df$two, df$one)) # taken from the other answer
# user system elapsed
# 76.46 0.06 77.25
system.time(res_vectorized <- frw(df$two, df$one))
# user system elapsed
# 0 0 0
identical(res_mapply, res_vectorized)
# [1] TRUE
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With