Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

efficiently generate a random sample of times and dates between two dates

I have written a (fairly naive) function to randomly select a date/time between two specified days

# set start and end dates to sample between day.start <- "2012/01/01" day.end <- "2012/12/31"  # define a random date/time selection function rand.day.time <- function(day.start,day.end,size) {   dayseq <- seq.Date(as.Date(day.start),as.Date(day.end),by="day")   dayselect <- sample(dayseq,size,replace=TRUE)   hourselect <- sample(1:24,size,replace=TRUE)   minselect <- sample(0:59,size,replace=TRUE)   as.POSIXlt(paste(dayselect, hourselect,":",minselect,sep="") ) } 

Which results in:

> rand.day.time(day.start,day.end,size=3) [1] "2012-02-07 21:42:00" "2012-09-02 07:27:00" "2012-06-15 01:13:00" 

But this seems to be slowing down considerably as the sample size ramps up.

# some benchmarking > system.time(rand.day.time(day.start,day.end,size=100000))    user  system elapsed     4.68    0.03    4.70  > system.time(rand.day.time(day.start,day.end,size=200000))    user  system elapsed     9.42    0.06    9.49  

Is anyone able to suggest how to do something like this in a more efficient manner?

like image 304
thelatemail Avatar asked Feb 06 '13 03:02

thelatemail


People also ask

How do you generate a random date between two dates?

To generate random dates between two dates, you can use the RANDBETWEEN function, together with the DATE function. This formula is then copied down from B5 to B11. The result is random dates between Jan 1, 2016 and Dec 31, 2016 (random dates in the year 2016).


2 Answers

Ahh, another date/time problem we can reduce to working in floats :)

Try this function

R> latemail <- function(N, st="2012/01/01", et="2012/12/31") { +     st <- as.POSIXct(as.Date(st)) +     et <- as.POSIXct(as.Date(et)) +     dt <- as.numeric(difftime(et,st,unit="sec")) +     ev <- sort(runif(N, 0, dt)) +     rt <- st + ev + } R> 

We compute the difftime in seconds, and then "merely" draw uniforms over it, sorting the result. Add that to the start and you're done:

R> set.seed(42); print(latemail(5))     ## round to date, or hour, or ... [1] "2012-04-14 05:34:56.369022 CDT" "2012-08-22 00:41:26.683809 CDT"  [3] "2012-10-29 21:43:16.335659 CDT" "2012-11-29 15:42:03.387701 CST" [5] "2012-12-07 18:46:50.233761 CST" R> system.time(latemail(100000))    user  system elapsed    0.024   0.000   0.021  R> system.time(latemail(200000))    user  system elapsed    0.044   0.000   0.045  R> system.time(latemail(10000000))   ## a few more than in your example :)    user  system elapsed    3.240   0.172   3.428  R>  
like image 69
Dirk Eddelbuettel Avatar answered Oct 23 '22 06:10

Dirk Eddelbuettel


Something like this will work too. Sorry for the random data frame, I just threw that in there so you could see a plot.

data=as.data.frame(list(ID=1:10,                    variable=rnorm(10,50,10)))  #This function will generate a uniform sample of dates from  #within a designated start and end date:  rand.date=function(start.day,end.day,data){      size=dim(data)[1]       days=seq.Date(as.Date(start.day),as.Date(end.day),by="day")     pick.day=runif(size,1,length(days))     date=days[pick.day]   }  #This will create a new column within your data frame called date:  data$date=rand.date("2014-01-01","2014-02-28",data)  #and this will order your data frame by date:  data=data[order(data$date),]  #Finally, you can see how the data looks  plot(data$date,data$variable,type="b") 
like image 36
s_scolary Avatar answered Oct 23 '22 08:10

s_scolary