Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Vectorise find closest date function

Tags:

date

r

I would like to pass in a vector of dates, and have returned the closest date from a second vector of (partially matching) dates.

The following function does what I require for a single date, however i cannot figure out how to generalise this to the case where searchDate is a vector of dates.

closestDate <- function(searchDate, dateList, roundDown=FALSE){
  if (roundDown) {
    dist2date <- as.Date(dateList) - as.Date(searchDate)
    closest <- which(max(dist2date[dist2date<=0]) == dist2date)
  } else {
    dist2date <- as.Date(dateList) - as.Date(searchDate)
    closest <- which(min(dist2date[dist2date>=0]) == dist2date)
  }
  return(dateList[closest])
}

dateSeq <- seq(as.Date("2011-01-01"), as.Date("2012-12-19"), by='day')
oddDates <- dateSeq[as.logical(1:length(dateSeq) %%2)]

closestDate('2012-12-14', oddDates)
[1] "2012-12-15"

miscDatesLong <- rep(c('2012-12-14', '2012-12-16', '2012-12-18'), 100 )
closestDate(miscDatesLong, oddDates)

closestDate(miscDatesLong, oddDates)
[1] "2012-12-15" "2012-12-17" "2012-12-19"
Warning message:
In unclass(time1) - unclass(time2) :
  longer object length is not a multiple of shorter object length

Could someone please help?

like image 932
ricardo Avatar asked Dec 19 '12 06:12

ricardo


4 Answers

The findInterval function can do this quickly:

dateSeq <- seq(as.Date("2011-01-01"), as.Date("2012-12-19"), by='day')
oddDates <- dateSeq[as.logical(1:length(dateSeq) %%2)]

oddDates[ findInterval(as.Date('2012-12-14'), oddDates)+1 ]

miscDatesLong <- rep(c('2012-12-14', '2012-12-16', '2012-12-18'), 100)

oddDates[ findInterval(as.Date(miscDatesLong), oddDates) + 1 ]

To round down instead of up remove the +1. If you really want to find the closest date, not the one just before or after you can create a new list of dates that are the midpoints of the intervals (as.Date(rowMeans(embed(as.numeric(oddDates),2)), '1960-01-01')) and use findInterval on those. See the arguments to findInterval for other options.

like image 115
Greg Snow Avatar answered Nov 10 '22 06:11

Greg Snow


?Vectorize

> closestDateV = Vectorize(closestDate,"searchDate")
> closestDateV(c('2012-12-15','2012-12-14'), oddDates)
2012-12-15 2012-12-14 
     15689      15689 

The returned values have had their date-ness removed. So add it back:

> as.Date(closestDateV(c('2012-12-15','2012-12-14'), oddDates),origin="1970-01-01")
  2012-12-15   2012-12-14 
"2012-12-15" "2012-12-15" 

You might want to wrap that all up in a new function.

Functional programming is fun!

like image 38
Spacedman Avatar answered Nov 10 '22 06:11

Spacedman


Now, with the example, just work on the subset of dates that are less than in one case or greater than in the other case, the particular target being examined at the time.

closestDt <- function(searchDate, dateList, roundDown=FALSE) 
     as.Date( sapply( searchDate , function (x) if( roundDown ){ 
                max( dateList[ dateList <= x ] ) } else {
                min( dateList[ dateList >= x])  } 
           ), "1970-01-01")
like image 20
IRTFM Avatar answered Nov 10 '22 04:11

IRTFM


# initiate a tie-breaking function
tie.breaker <-
    function( x , y , la = look.after ){

        # if look.after is TRUE, eliminate all values below x
        # otherwise, eliminate all values above x
        if ( la ) y[ y < x ] <- NA else y[ y > x ] <- NA

        # then among the remaining values, figure out the date the shortest distance away
        z <- which.min( abs( x - y ) )[1]
        # use [1] to just take the first result, in case y contains duplicate dates

        # return z
        return( z )
    }

# initiate your main function
closestDate <- 
    function( searchDate , dateList , look.after = FALSE ){

        # apply a which.min( abs( ) ) command to each of the dates given, 
        # across every date in the larger list
        dist2date <- 
            sapply( 

                # on every element of searchDate..
                as.Date( searchDate ) ,

                # ..run the tie.breaker() function
                tie.breaker , 

                # and each time, pass in the dateList
                as.Date( dateList ) ,

                # and also the look.after TRUE/FALSE flag
                look.after
            )

        # return the matching dates in the same order as passed in
        dateList[ dist2date ]
    }

# try with two input dates
searchDate <- c( '2012-12-14' , '2012-11-18' )

# create a few dates to test against..
someDates <- c( '2012-11-12' ,  '2012-11-17' , '2012-12-15' , '2012-12-13' , '2012-12-15' , '2012-11-17' , '2012-11-20' )

# return the two dates closests to the inputted dates

# the first result gives 12/13, because look.after = FALSE
closestDate( searchDate , someDates )

# the first result gives 12/15, because look.after = TRUE
closestDate( searchDate , someDates , look.after = TRUE )

# reverse the order to prove it still works
someDates <- c( '2012-11-12' , '2012-11-17' , '2012-12-13' , '2012-12-15' , '2012-12-13' , '2012-12-15' , '2012-11-17' )

# the first result gives 12/13, because look.after = FALSE
closestDate( searchDate , someDates )

# the first result gives 12/15, because look.after = TRUE
closestDate( searchDate , someDates , look.after = TRUE )
like image 41
Anthony Damico Avatar answered Nov 10 '22 04:11

Anthony Damico