I have written a (fairly naive) function to randomly select a date/time between two specified days
# set start and end dates to sample between day.start <- "2012/01/01" day.end <- "2012/12/31" # define a random date/time selection function rand.day.time <- function(day.start,day.end,size) { dayseq <- seq.Date(as.Date(day.start),as.Date(day.end),by="day") dayselect <- sample(dayseq,size,replace=TRUE) hourselect <- sample(1:24,size,replace=TRUE) minselect <- sample(0:59,size,replace=TRUE) as.POSIXlt(paste(dayselect, hourselect,":",minselect,sep="") ) }
Which results in:
> rand.day.time(day.start,day.end,size=3) [1] "2012-02-07 21:42:00" "2012-09-02 07:27:00" "2012-06-15 01:13:00"
But this seems to be slowing down considerably as the sample size ramps up.
# some benchmarking > system.time(rand.day.time(day.start,day.end,size=100000)) user system elapsed 4.68 0.03 4.70 > system.time(rand.day.time(day.start,day.end,size=200000)) user system elapsed 9.42 0.06 9.49
Is anyone able to suggest how to do something like this in a more efficient manner?
To generate random dates between two dates, you can use the RANDBETWEEN function, together with the DATE function. This formula is then copied down from B5 to B11. The result is random dates between Jan 1, 2016 and Dec 31, 2016 (random dates in the year 2016).
Ahh, another date/time problem we can reduce to working in floats :)
Try this function
R> latemail <- function(N, st="2012/01/01", et="2012/12/31") { + st <- as.POSIXct(as.Date(st)) + et <- as.POSIXct(as.Date(et)) + dt <- as.numeric(difftime(et,st,unit="sec")) + ev <- sort(runif(N, 0, dt)) + rt <- st + ev + } R>
We compute the difftime
in seconds, and then "merely" draw uniforms over it, sorting the result. Add that to the start and you're done:
R> set.seed(42); print(latemail(5)) ## round to date, or hour, or ... [1] "2012-04-14 05:34:56.369022 CDT" "2012-08-22 00:41:26.683809 CDT" [3] "2012-10-29 21:43:16.335659 CDT" "2012-11-29 15:42:03.387701 CST" [5] "2012-12-07 18:46:50.233761 CST" R> system.time(latemail(100000)) user system elapsed 0.024 0.000 0.021 R> system.time(latemail(200000)) user system elapsed 0.044 0.000 0.045 R> system.time(latemail(10000000)) ## a few more than in your example :) user system elapsed 3.240 0.172 3.428 R>
Something like this will work too. Sorry for the random data frame, I just threw that in there so you could see a plot.
data=as.data.frame(list(ID=1:10, variable=rnorm(10,50,10))) #This function will generate a uniform sample of dates from #within a designated start and end date: rand.date=function(start.day,end.day,data){ size=dim(data)[1] days=seq.Date(as.Date(start.day),as.Date(end.day),by="day") pick.day=runif(size,1,length(days)) date=days[pick.day] } #This will create a new column within your data frame called date: data$date=rand.date("2014-01-01","2014-02-28",data) #and this will order your data frame by date: data=data[order(data$date),] #Finally, you can see how the data looks plot(data$date,data$variable,type="b")
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With