I have the following dataframes (df11 and df22) I'd like to do a merge/full join on with "UserID=UserID" and date difference <= 30 . So if the UserIDs match up AND the date's are less than or equal to 30, I'd like them merged into one singular row. I've looked at fuzzy join here and sqldf here but I can't figure out how to implement either of those for my data frames.
df1 <- structure(list(UserID = c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 1L),
Full.Name = c( "John Smith", "Jack Peters", "Bob Brown", "Jane Doe", "Jackie Jane", "Sarah Brown", "Chloe Brown", "John Smith" ),
Info = c("yes", "no", "yes", "yes", "yes", "yes", "no", "yes"),
EncounterID = c(13L, 14L, 15L, 16L, 17L, 18L, 19L, 13L), DateTime = c("1/2/21 00:00", "1/5/21 12:00", "1/1/21 1:31", "1/5/21 3:34", "5/9/21 5:33", "5/8/21 3:39", "12/12/21 2:30", "12/11/21 9:21"),
Temp = c("100", "103", "104", "103", "101", "102", "103", "105"),
misc = c("(null)", "no", "(null)", "(null)", "(null)","(null)", "(null)", "(null)"
)),
class = "data.frame", row.names = c(NA,
-8L))
df2 <- structure(list(UserID = c(1L, 2L, 3L, 4L, 5L, 6L),
Full.Name = c("John Smith", "Jack Peters", "Bob Brown", "Jane Doe", "Jackie Jane", "Sarah Brown"),
DOB = c("1/1/90", "1/10/90", "1/2/90", "2/20/80", "2/2/80", "12/2/80"),
EncounterID = c(13L, 14L, 15L, 16L, 17L, 18L), EncounterDate = c("1/1/21", "1/2/21", "1/1/21", "1/6/21", "5/7/21", "5/8/21"),
Type = c("Intro", "Intro", "Intro", "Intro", "Care", "Out"),
responses = c("(null)", "no",
"yes", "no", "no", "unsat")),
class = "data.frame", row.names = c(NA,
-6L))
loadedNamespaces()
install.packages("Rcpp")
library(dplyr)
library(tidyr)
install.packages("lubridate")
library(lubridate)
df11 <-
df1 %>%
separate(DateTime, c("Date", "Time"), sep=" ") %>%
mutate(Date = as_datetime(mdy(Date))) %>%
select(-Time) %>%
as_tibble()
df22 <-
df2 %>%
mutate(across(c(EncounterDate), mdy)) %>%
mutate(across(c(EncounterDate), as_datetime)) %>%
as_tibble()
@r2evans After running the first set of code, I get the following output. Which is slightly different from yours.
df11 <- mutate(df11, Date_m30 = Date %m-% days(30), Date_p30 = Date %m+% days(30))
df11
# A tibble: 8 x 7
UserID Full.Name Info EncounterID Date Temp misc
<int> <chr> <chr> <int> <dttm> <chr> <chr>
1 1 John Smith yes 13 2021-01-02 00:00:00 100 (null)
2 2 Jack Peters no 14 2021-01-05 00:00:00 103 no
3 3 Bob Brown yes 15 2021-01-01 00:00:00 104 (null)
4 4 Jane Doe yes 16 2021-01-05 00:00:00 103 (null)
5 5 Jackie Jane yes 17 2021-05-09 00:00:00 101 (null)
6 6 Sarah Brown yes 18 2021-05-08 00:00:00 102 (null)
7 7 Chloe Brown no 19 2021-12-12 00:00:00 103 (null)
8 1 John Smith yes 13 2021-12-11 00:00:00 105 (null)
One way is to first create "+/- 30 day" columns in one of them, then do a standard date-range join. Using sqldf:
Prep:
library(dplyr)
df11 <- mutate(df11, Date_m30 = Date %m-% days(30), Date_p30 = Date %m+% days(30))
df11
# # A tibble: 8 x 9
# UserID Full.Name Info EncounterID Date Temp misc Date_m30 Date_p30
# <int> <chr> <chr> <int> <dttm> <chr> <chr> <dttm> <dttm>
# 1 1 John Smith yes 13 2021-01-02 00:00:00 100 (null) 2020-12-03 00:00:00 2021-02-01 00:00:00
# 2 2 Jack Peters no 14 2021-01-05 00:00:00 103 no 2020-12-06 00:00:00 2021-02-04 00:00:00
# 3 3 Bob Brown yes 15 2021-01-01 00:00:00 104 (null) 2020-12-02 00:00:00 2021-01-31 00:00:00
# 4 4 Jane Doe yes 16 2021-01-05 00:00:00 103 (null) 2020-12-06 00:00:00 2021-02-04 00:00:00
# 5 5 Jackie Jane yes 17 2021-05-09 00:00:00 101 (null) 2021-04-09 00:00:00 2021-06-08 00:00:00
# 6 6 Sarah Brown yes 18 2021-05-08 00:00:00 102 (null) 2021-04-08 00:00:00 2021-06-07 00:00:00
# 7 7 Chloe Brown no 19 2021-12-12 00:00:00 103 (null) 2021-11-12 00:00:00 2022-01-11 00:00:00
# 8 1 John Smith yes 13 2021-12-11 00:00:00 105 (null) 2021-11-11 00:00:00 2022-01-10 00:00:00
The join:
sqldf::sqldf("
select df11.*, df22.DOB, df22.EncounterDate, df22.Type, df22.responses
from df11
left join df22 on df11.UserID = df22.UserID
and df22.EncounterDate between df11.Date_m30 and df11.Date_p30") %>%
select(-Date_m30, -Date_p30)
# UserID Full.Name Info EncounterID Date Temp misc DOB EncounterDate Type responses
# 1 1 John Smith yes 13 2021-01-01 19:00:00 100 (null) 1/1/90 2020-12-31 19:00:00 Intro (null)
# 2 2 Jack Peters no 14 2021-01-04 19:00:00 103 no 1/10/90 2021-01-01 19:00:00 Intro no
# 3 3 Bob Brown yes 15 2020-12-31 19:00:00 104 (null) 1/2/90 2020-12-31 19:00:00 Intro yes
# 4 4 Jane Doe yes 16 2021-01-04 19:00:00 103 (null) 2/20/80 2021-01-05 19:00:00 Intro no
# 5 5 Jackie Jane yes 17 2021-05-08 20:00:00 101 (null) 2/2/80 2021-05-06 20:00:00 Care no
# 6 6 Sarah Brown yes 18 2021-05-07 20:00:00 102 (null) 12/2/80 2021-05-07 20:00:00 Out unsat
# 7 7 Chloe Brown no 19 2021-12-11 19:00:00 103 (null) <NA> <NA> <NA> <NA>
# 8 1 John Smith yes 13 2021-12-10 19:00:00 105 (null) <NA> <NA> <NA> <NA>
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With