My data contains a column of order dates. It also has a column of delivery dates. Some of the delivery dates are a date (12/31/1990) that occurred before the order date, which is causing problems in calculating average shipping time. I would like to take the order date for these rows and add a random number of days from a uniform distribution.
First, I tried to write a function that I could apply to the data, but the result was not what I wanted. What I want is for the simulated delivery date to end up in the delivery date column.
func1 = function(x){
if(x[2]=="1990-12-31" && !is.na(x[2]))
x[2] = as.Date(x[1]) + floor(runif(1,min=0,max=30))
return (x)
}
Example data:
x <- structure(list(orderDate = structure(c(15706, 15706, 15706, 15706,
15706), class = "Date"), deliveryDate = structure(c(15707, 15707,
7669, 15707, 7669), class = "Date")), .Names = c("orderDate",
"deliveryDate"), row.names = c(NA, 5L), class = "data.frame")
# orderDate deliveryDate
#1 2013-01-01 2013-01-02
#2 2013-01-01 2013-01-02
#3 2013-01-01 1990-12-31
#4 2013-01-01 2013-01-02
#5 2013-01-01 1990-12-31
If I did not get it wrong, x
is a data frame with 2 columns. A vectorized if
implementation can be achieved via ifelse
:
x[[2]] <- structure(ifelse(x[[2]] == "1990-12-31" & !is.na(x[[2]]),
as.Date(x[[1]]) + sample(0:30, 1),
x[[2]]),
class = "Date")
Or a faster replacement:
ind <- x[[2]] == "1990-12-31" & !is.na(x[[2]])
x[ind, 2] <- as.Date(x[ind, 1]) + sample(0:30, sum(ind), replace = TRUE)
With your example dataset and the same random seed 0, both options give the same result:
# orderDate deliveryDate
#1 2013-01-01 2013-01-02
#2 2013-01-01 2013-01-02
#3 2013-01-01 2013-01-28
#4 2013-01-01 2013-01-02
#5 2013-01-01 2013-01-28
In the first case, ifelse
alone is returning integers (the internal representation of "Date"), hence we need to give "Date" class to it to make it a "Date".
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With