Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to perform join over date ranges using data.table?

Tags:

How to do the below (straightforward using sqldf) using data.table and get exact same result:

library(data.table)  whatWasMeasured <- data.table(start=as.POSIXct(seq(1, 1000, 100),     origin="1970-01-01 00:00:00"),     end=as.POSIXct(seq(10, 1000, 100), origin="1970-01-01 00:00:00"),     x=1:10,     y=letters[1:10])  measurments <- data.table(time=as.POSIXct(seq(1, 2000, 1),     origin="1970-01-01 00:00:00"),     temp=runif(2000, 10, 100))  ## Alternative short names for data.tables dt1 <- whatWasMeasured dt2 <- measurments  ## Straightforward with sqldf     library(sqldf)  sqldf("select * from measurments m, whatWasMeasured wwm where m.time between wwm.start and wwm.end") 
like image 895
Samo Avatar asked Dec 15 '14 15:12

Samo


People also ask

When I is a data table or character vector the columns to join by must be specified using?

table (or character vector), the columns to join by must be specified using 'on=' argument (see ? data. table), by keying x (i.e. sorted, and, marked as sorted, see ? setkey), or by sharing column names between x and i (i.e., a natural join).

How do I merge two data tables in R?

To join two data frames (datasets) vertically, use the rbind function. The two data frames must have the same variables, but they do not have to be in the same order. If data frameA has variables that data frameB does not, then either: Delete the extra variables in data frameA or.

What is data table in R?

data.table is an R package that provides an enhanced version of data.frame s, which are the standard data structure for storing data in base R. In the Data section above, we already created a data.table using fread() . We can also create one using the data.table() function.


1 Answers

You can use the foverlaps() function which implements joins over intervals efficiently. In your case, we just need a dummy column for measurments.

Note 1: You should install the development version of data.table - v1.9.5 as a bug with foverlaps() has been fixed there. You can find the installation instructions here.

Note 2: I'll call whatWasMeasured = dt1 and measurments = dt2 here for convenience.

require(data.table) ## 1.9.5+ dt2[, dummy := time]  setkey(dt1, start, end) ans = foverlaps(dt2, dt1, by.x=c("time", "dummy"), nomatch=0L)[, dummy := NULL] 

See ?foverlaps for more info and this post for a performance comparison.

like image 135
Arun Avatar answered Sep 28 '22 03:09

Arun