Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Join two data frames in R based on closest timestamp

Hi I have two tables (table1 and table2 below) and would like to join them based on the closest timestamp to form expected_output. Some kind of solution involving dplyr would be great if possible, but not if it further complicates things.

table1 = 
structure(list(date = structure(c(1437051300, 1434773700, 1431457200
), class = c("POSIXct", "POSIXt"), tzone = ""), val1 = c(94L, 
33L, 53L)), .Names = c("date", "val1"), row.names = c(NA, -3L
), class = "data.frame")

table2 = 
structure(list(date = structure(c(1430248288, 1435690482, 1434050843
), class = c("POSIXct", "POSIXt"), tzone = ""), val2 = c(67L, 
90L, 18L)), .Names = c("date", "val2"), row.names = c(NA, -3L
), class = "data.frame")

expected_output = 
structure(list(date = structure(c(1437051300, 1434773700, 1431457200
), class = c("POSIXct", "POSIXt"), tzone = ""), val1 = c(94L,
33L, 53L), val2 = c(90L, 18L, 67L)), .Names = c("date", "val1", 
"val2"), row.names = c(NA, -3L), class = "data.frame")
like image 891
John Avatar asked Aug 04 '15 20:08

John


1 Answers

Using rolling joins feature of data.table with roll = "nearest":

require(data.table) # v1.9.6+
setDT(table1)[, val2 := setDT(table2)[table1, val2, on = "date", roll = "nearest"]]

Here, val2 column is created by performing a join on the column date with roll = "nearest" option. For each row of table1$date, the closest matching row from table2$date is computed, and val2 for corresponding row is extracted.

like image 139
Arun Avatar answered Sep 22 '22 23:09

Arun