I have a data of repeated measurements by day (like a glucose monitor). Each individual can have wear the device during period of times, so an individual can have repeated measurements for 1 week and others for 6 months. Then each individual will have a column with a visit date. Some individuals can have more than one visit (tops 3 visits).
My data looks like this (apologies but I don't know how to generate a simulated data)
datos
id visit_Date time value
1 2020-03-06 2022-03-23 17:09:00 10
1 2020-03-06 2022-03-23 17:14:00 11
1 2020-03-06 2022-03-23 17:19:00 11
1 2020-03-06 2022-03-23 17:24:00 12
1 2020-03-06 2022-03-23 17:29:00 11
2 2018-03-30 2022-03-24 17:22:00 13
2 2018-03-30 2022-03-24 17:27:00 10
2 2018-03-30 2022-03-24 17:32:00 11
2 2018-03-30 2022-03-24 17:37:00 12
Some individuals can wear the monitoring device before the visit_Date
and others wear the monitor only after the visit_Date
. What I need it is to extract the id, visit_Date
, time
, and value
in an interval of 10 days but closest to the visit_Date
. Ideally it should be after the visit date but some individuals don't wear it long enough after the visit, in that case, an interval overlapping the visit_Date
should be fine.
What I have done so far it is getting the closest date by grouping by id
and visit_Date
, arranging by descending time
and then filtering by time + 10 days
. Here is the code I am using:
library(lubridate)
datos %>%
dplyr::filter(time >= visit_Date) %>%
group_by(id, visit_Date) %>%
arrange(desc(time)) %>%
dplyr::filter(time < time + 10) %>%
summarise(min = min(time), max = max(time))
I am using summarise to see how many days I get in total after filtering, however this filtering is only for time
after the visit date, and some individuals don't wear the device for 10 days after the visit. I guess I could repeat the same filtering process for time <= visit_Date
for those individuals with less than 10 days, and then merge both datasets but I was wondering if there is another way of doing it
My desired output will be all the rows for each individual with the 10 interval days closest to the visit_Date
, ideally after visit_Date
but if not overlapping visit_Date
. Here it is an example output (bear in mind that my real date has in the column time
a data point for every 5 minutes.
Desired output (example with 2 individuals)
> datos
id visit_Date time value
1 2020-03-06 2022-02-22 17:09:00 10
1 2020-03-06 2022-02-26 17:09:00 10
1 2020-03-06 2022-02-26 17:14:00 11
1 2020-03-06 2022-02-28 17:19:00 11
1 2020-03-06 2022-03-07 17:24:00 12
1 2020-03-06 2022-03-07 17:29:00 11
2 2018-03-30 2022-03-24 17:22:00 13
2 2018-03-30 2022-03-30 17:27:00 10
2 2018-03-30 2022-04-10 17:32:00 11
2 2018-03-30 2022-04-24 17:37:00 12
2 2018-03-30 2022-04-26 17:37:00 12
# Desired output
id visit_Date time value
1 2020-03-06 2022-02-26 17:09:00 10
1 2020-03-06 2022-02-26 17:14:00 11
1 2020-03-06 2022-02-28 17:19:00 11
1 2020-03-06 2022-03-07 17:24:00 12
1 2020-03-06 2022-03-07 17:29:00 11
2 2018-03-30 2022-03-30 17:27:00 10
2 2018-03-30 2022-04-10 17:32:00 11
In this example, the individual id1 didn't wear the device longer than 10 days after the visit date, so the output should include date from before the visit but only the 10 days-interval closest to the visit_Date
. In the case of id2 the individual wore the device longer than 10 days after the visit_Date
and we select the 10 days closest to the visit_Date
but after or equal to visit_Date
Thanks
For an id, we can check if difftime
from visit_date
exceeds ten days. If so we can easily subset by those values that are TRUE
. Else, conversely we do this for differences greater that minus ten days.
> by(dat, dat$id, \(x) {
+ if (any(d10 <- difftime(x$visit_Date, as.Date(x$time), unit='days') > 10)) {
+ x[d10, ]
+ } else {
+ x[difftime(x$visit_Date, as.Date(x$time), unit='days') > -10, ]
+ }
+ }) |> do.call(what='rbind')
id visit_Date time value u
1.1 1 2022-01-01 02:27:41 2022-01-01 02:27:41 12 0
1.2 1 2022-01-01 02:27:41 2022-01-01 14:27:41 12 0
1.3 1 2022-01-01 02:27:41 2022-01-02 02:27:41 12 0
1.4 1 2022-01-01 02:27:41 2022-01-02 14:27:41 10 0
1.5 1 2022-01-01 02:27:41 2022-01-03 02:27:41 13 0
1.6 1 2022-01-01 02:27:41 2022-01-03 14:27:41 10 0
1.7 1 2022-01-01 02:27:41 2022-01-04 02:27:41 10 0
1.8 1 2022-01-01 02:27:41 2022-01-04 14:27:41 12 0
1.9 1 2022-01-01 02:27:41 2022-01-05 02:27:41 10 0
1.10 1 2022-01-01 02:27:41 2022-01-05 14:27:41 12 0
1.11 1 2022-01-01 02:27:41 2022-01-06 02:27:41 11 0
1.12 1 2022-01-01 02:27:41 2022-01-06 14:27:41 12 0
1.13 1 2022-01-01 02:27:41 2022-01-07 02:27:41 12 0
1.14 1 2022-01-01 02:27:41 2022-01-07 14:27:41 10 0
1.15 1 2022-01-01 02:27:41 2022-01-08 02:27:41 9 0
1.16 1 2022-01-01 02:27:41 2022-01-08 14:27:41 10 0
1.17 1 2022-01-01 02:27:41 2022-01-09 02:27:41 12 0
1.18 1 2022-01-01 02:27:41 2022-01-09 14:27:41 13 0
1.19 1 2022-01-01 02:27:41 2022-01-10 02:27:41 11 0
1.20 1 2022-01-01 02:27:41 2022-01-10 14:27:41 11 0
1.21 1 2022-01-01 02:27:41 2022-01-11 02:27:41 10 0
1.22 1 2022-01-01 02:27:41 2022-01-11 14:27:41 11 0
2.26 2 2022-12-04 17:21:57 2022-12-01 17:21:57 11 -3
2.27 2 2022-12-04 17:21:57 2022-12-02 05:21:57 11 -3
2.28 2 2022-12-04 17:21:57 2022-12-02 17:21:57 11 -3
2.29 2 2022-12-04 17:21:57 2022-12-03 05:21:57 11 -3
2.30 2 2022-12-04 17:21:57 2022-12-03 17:21:57 10 -3
2.31 2 2022-12-04 17:21:57 2022-12-04 05:21:57 12 -3
2.32 2 2022-12-04 17:21:57 2022-12-04 17:21:57 11 -3
2.33 2 2022-12-04 17:21:57 2022-12-05 05:21:57 12 -3
2.34 2 2022-12-04 17:21:57 2022-12-05 17:21:57 12 -3
2.35 2 2022-12-04 17:21:57 2022-12-06 05:21:57 13 -3
2.36 2 2022-12-04 17:21:57 2022-12-06 17:21:57 12 -3
2.37 2 2022-12-04 17:21:57 2022-12-07 05:21:57 10 -3
2.38 2 2022-12-04 17:21:57 2022-12-07 17:21:57 10 -3
2.39 2 2022-12-04 17:21:57 2022-12-08 05:21:57 12 -3
2.40 2 2022-12-04 17:21:57 2022-12-08 17:21:57 12 -3
2.41 2 2022-12-04 17:21:57 2022-12-09 05:21:57 13 -3
2.42 2 2022-12-04 17:21:57 2022-12-09 17:21:57 12 -3
2.43 2 2022-12-04 17:21:57 2022-12-10 05:21:57 10 -3
2.44 2 2022-12-04 17:21:57 2022-12-10 17:21:57 10 -3
2.45 2 2022-12-04 17:21:57 2022-12-11 05:21:57 10 -3
2.46 2 2022-12-04 17:21:57 2022-12-11 17:21:57 12 -3
2.47 2 2022-12-04 17:21:57 2022-12-12 05:21:57 10 -3
2.48 2 2022-12-04 17:21:57 2022-12-12 17:21:57 10 -3
2.49 2 2022-12-04 17:21:57 2022-12-13 05:21:57 9 -3
2.50 2 2022-12-04 17:21:57 2022-12-13 17:21:57 9 -3
3.51 3 2022-01-31 11:13:21 2022-01-28 11:13:21 12 -3
3.52 3 2022-01-31 11:13:21 2022-01-28 23:13:21 13 -3
3.53 3 2022-01-31 11:13:21 2022-01-29 11:13:21 11 -3
3.54 3 2022-01-31 11:13:21 2022-01-29 23:13:21 12 -3
3.55 3 2022-01-31 11:13:21 2022-01-30 11:13:21 11 -3
3.56 3 2022-01-31 11:13:21 2022-01-30 23:13:21 10 -3
3.57 3 2022-01-31 11:13:21 2022-01-31 11:13:21 11 -3
3.58 3 2022-01-31 11:13:21 2022-01-31 23:13:21 13 -3
3.59 3 2022-01-31 11:13:21 2022-02-01 11:13:21 11 -3
3.60 3 2022-01-31 11:13:21 2022-02-01 23:13:21 10 -3
3.61 3 2022-01-31 11:13:21 2022-02-02 11:13:21 10 -3
3.62 3 2022-01-31 11:13:21 2022-02-02 23:13:21 11 -3
3.63 3 2022-01-31 11:13:21 2022-02-03 11:13:21 12 -3
3.64 3 2022-01-31 11:13:21 2022-02-03 23:13:21 10 -3
3.65 3 2022-01-31 11:13:21 2022-02-04 11:13:21 9 -3
3.66 3 2022-01-31 11:13:21 2022-02-04 23:13:21 11 -3
3.67 3 2022-01-31 11:13:21 2022-02-05 11:13:21 12 -3
3.68 3 2022-01-31 11:13:21 2022-02-05 23:13:21 11 -3
3.69 3 2022-01-31 11:13:21 2022-02-06 11:13:21 10 -3
3.70 3 2022-01-31 11:13:21 2022-02-06 23:13:21 11 -3
3.71 3 2022-01-31 11:13:21 2022-02-07 11:13:21 13 -3
3.72 3 2022-01-31 11:13:21 2022-02-07 23:13:21 11 -3
3.73 3 2022-01-31 11:13:21 2022-02-08 11:13:21 10 -3
3.74 3 2022-01-31 11:13:21 2022-02-08 23:13:21 11 -3
3.75 3 2022-01-31 11:13:21 2022-02-09 11:13:21 13 -3
4.76 4 2022-03-01 01:40:46 2022-02-17 01:40:46 10 -12
4.77 4 2022-03-01 01:40:46 2022-02-17 13:40:46 9 -12
4.78 4 2022-03-01 01:40:46 2022-02-18 01:40:46 10 -12
4.79 4 2022-03-01 01:40:46 2022-02-18 13:40:46 10 -12
4.80 4 2022-03-01 01:40:46 2022-02-19 01:40:46 10 -12
4.81 4 2022-03-01 01:40:46 2022-02-19 13:40:46 10 -12
5.101 5 2022-07-23 18:27:11 2022-07-11 18:27:11 12 -12
5.102 5 2022-07-23 18:27:11 2022-07-12 06:27:11 9 -12
5.103 5 2022-07-23 18:27:11 2022-07-12 18:27:11 11 -12
5.104 5 2022-07-23 18:27:11 2022-07-13 06:27:11 11 -12
5.105 5 2022-07-23 18:27:11 2022-07-13 18:27:11 11 -12
If you want to group by
both id and visit_Date, you are looking for the interaction
.
> by(dat, with(dat, interaction(id, visit_Date)), \(x) {
+ if (any(d10 <- difftime(x$visit_Date, as.Date(x$time), unit='days') > 10)) {
+ x[d10, ]
+ } else {
+ x[difftime(x$visit_Date, as.Date(x$time), unit='days') > -10, ]
+ }
+ }) |> do.call(what='rbind')
Data:
Simulated data for a random date in 2022, added 12 days and randomly subtracted 3, 0 or 12 days. left the u
column in example data for orientation. Instead of 5 min intervals I used 12 hours for sake of brevity.
set.seed(42)
dat <- lapply(seq_len(5), \(x) {
visit_Date <- as.POSIXct(sample.int(31535700, 1), origin='2022-01-01')
rnd <- runif(1L)
if (rnd > .66) {
u <- -3
} else if (rnd > .33) {
u <- 0
} else {
u <- -12
}
time <- seq.POSIXt(visit_Date, visit_Date + 12*60*60*24, by='12 hour') +
u*60*60*24
list(id=x,
visit_Date=visit_Date,
time=time,
value=round(runif(length(time), 9, 13)),
u=u)
}) |> lapply(data.frame) |> do.call(what='rbind')
Gives
> by(dat, ~id, tail, 3)
id: 1
id visit_Date time value u
23 1 2022-02-07 03:52:21 2022-02-06 03:52:21 11 -12
24 1 2022-02-07 03:52:21 2022-02-06 15:52:21 11 -12
25 1 2022-02-07 03:52:21 2022-02-07 03:52:21 13 -12
---------------------------------------------------------------------------------------------------
id: 2
id visit_Date time value u
48 2 2022-03-24 14:17:57 2022-04-01 15:17:57 12 -3
49 2 2022-03-24 14:17:57 2022-04-02 03:17:57 9 -3
50 2 2022-03-24 14:17:57 2022-04-02 15:17:57 12 -3
---------------------------------------------------------------------------------------------------
id: 3
id visit_Date time value u
73 3 2022-09-26 17:29:28 2022-09-25 17:29:28 10 -12
74 3 2022-09-26 17:29:28 2022-09-26 05:29:28 10 -12
75 3 2022-09-26 17:29:28 2022-09-26 17:29:28 12 -12
---------------------------------------------------------------------------------------------------
id: 4
id visit_Date time value u
98 4 2022-04-29 20:09:16 2022-04-28 20:09:16 9 -12
99 4 2022-04-29 20:09:16 2022-04-29 08:09:16 11 -12
100 4 2022-04-29 20:09:16 2022-04-29 20:09:16 12 -12
---------------------------------------------------------------------------------------------------
id: 5
id visit_Date time value u
123 5 2022-03-17 11:28:55 2022-03-28 12:28:55 9 0
124 5 2022-03-17 11:28:55 2022-03-29 00:28:55 12 0
125 5 2022-03-17 11:28:55 2022-03-29 12:28:55 11 0
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With