Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to extract the 10 days interval closest to a data specified in another column r

Tags:

r

lubridate

I have a data of repeated measurements by day (like a glucose monitor). Each individual can have wear the device during period of times, so an individual can have repeated measurements for 1 week and others for 6 months. Then each individual will have a column with a visit date. Some individuals can have more than one visit (tops 3 visits).

My data looks like this (apologies but I don't know how to generate a simulated data)

datos

id  visit_Date   time                 value
1   2020-03-06   2022-03-23 17:09:00  10    
1   2020-03-06   2022-03-23 17:14:00  11   
1   2020-03-06   2022-03-23 17:19:00  11   
1   2020-03-06   2022-03-23 17:24:00  12   
1   2020-03-06   2022-03-23 17:29:00  11   
2   2018-03-30   2022-03-24 17:22:00  13   
2   2018-03-30   2022-03-24 17:27:00  10
2   2018-03-30   2022-03-24 17:32:00  11
2   2018-03-30   2022-03-24 17:37:00  12

Some individuals can wear the monitoring device before the visit_Date and others wear the monitor only after the visit_Date. What I need it is to extract the id, visit_Date, time, and value in an interval of 10 days but closest to the visit_Date. Ideally it should be after the visit date but some individuals don't wear it long enough after the visit, in that case, an interval overlapping the visit_Date should be fine.

What I have done so far it is getting the closest date by grouping by id and visit_Date, arranging by descending time and then filtering by time + 10 days. Here is the code I am using:

library(lubridate)

datos %>% 
   dplyr::filter(time >= visit_Date) %>%
   group_by(id, visit_Date) %>%
   arrange(desc(time)) %>%
   dplyr::filter(time < time + 10) %>%
   summarise(min = min(time), max = max(time))

I am using summarise to see how many days I get in total after filtering, however this filtering is only for time after the visit date, and some individuals don't wear the device for 10 days after the visit. I guess I could repeat the same filtering process for time <= visit_Date for those individuals with less than 10 days, and then merge both datasets but I was wondering if there is another way of doing it

EDIT

My desired output will be all the rows for each individual with the 10 interval days closest to the visit_Date, ideally after visit_Date but if not overlapping visit_Date. Here it is an example output (bear in mind that my real date has in the column time a data point for every 5 minutes.

Desired output (example with 2 individuals)

> datos

id  visit_Date   time                 value
1   2020-03-06   2022-02-22 17:09:00  10    
1   2020-03-06   2022-02-26 17:09:00  10    
1   2020-03-06   2022-02-26 17:14:00  11   
1   2020-03-06   2022-02-28 17:19:00  11   
1   2020-03-06   2022-03-07 17:24:00  12   
1   2020-03-06   2022-03-07 17:29:00  11   
2   2018-03-30   2022-03-24 17:22:00  13   
2   2018-03-30   2022-03-30 17:27:00  10
2   2018-03-30   2022-04-10 17:32:00  11
2   2018-03-30   2022-04-24 17:37:00  12
2   2018-03-30   2022-04-26 17:37:00  12

# Desired output

id  visit_Date   time                 value
1   2020-03-06   2022-02-26 17:09:00  10    
1   2020-03-06   2022-02-26 17:14:00  11   
1   2020-03-06   2022-02-28 17:19:00  11   
1   2020-03-06   2022-03-07 17:24:00  12   
1   2020-03-06   2022-03-07 17:29:00  11   
2   2018-03-30   2022-03-30 17:27:00  10
2   2018-03-30   2022-04-10 17:32:00  11

In this example, the individual id1 didn't wear the device longer than 10 days after the visit date, so the output should include date from before the visit but only the 10 days-interval closest to the visit_Date. In the case of id2 the individual wore the device longer than 10 days after the visit_Date and we select the 10 days closest to the visit_Date but after or equal to visit_Date

Thanks

like image 920
user2380782 Avatar asked Oct 15 '25 19:10

user2380782


1 Answers

For an id, we can check if difftime from visit_date exceeds ten days. If so we can easily subset by those values that are TRUE. Else, conversely we do this for differences greater that minus ten days.

> by(dat, dat$id, \(x) {
+   if (any(d10 <- difftime(x$visit_Date, as.Date(x$time), unit='days') > 10)) {
+     x[d10, ]
+   } else {
+     x[difftime(x$visit_Date, as.Date(x$time), unit='days') > -10, ]
+   }
+ }) |> do.call(what='rbind')
      id          visit_Date                time value   u
1.1    1 2022-01-01 02:27:41 2022-01-01 02:27:41    12   0
1.2    1 2022-01-01 02:27:41 2022-01-01 14:27:41    12   0
1.3    1 2022-01-01 02:27:41 2022-01-02 02:27:41    12   0
1.4    1 2022-01-01 02:27:41 2022-01-02 14:27:41    10   0
1.5    1 2022-01-01 02:27:41 2022-01-03 02:27:41    13   0
1.6    1 2022-01-01 02:27:41 2022-01-03 14:27:41    10   0
1.7    1 2022-01-01 02:27:41 2022-01-04 02:27:41    10   0
1.8    1 2022-01-01 02:27:41 2022-01-04 14:27:41    12   0
1.9    1 2022-01-01 02:27:41 2022-01-05 02:27:41    10   0
1.10   1 2022-01-01 02:27:41 2022-01-05 14:27:41    12   0
1.11   1 2022-01-01 02:27:41 2022-01-06 02:27:41    11   0
1.12   1 2022-01-01 02:27:41 2022-01-06 14:27:41    12   0
1.13   1 2022-01-01 02:27:41 2022-01-07 02:27:41    12   0
1.14   1 2022-01-01 02:27:41 2022-01-07 14:27:41    10   0
1.15   1 2022-01-01 02:27:41 2022-01-08 02:27:41     9   0
1.16   1 2022-01-01 02:27:41 2022-01-08 14:27:41    10   0
1.17   1 2022-01-01 02:27:41 2022-01-09 02:27:41    12   0
1.18   1 2022-01-01 02:27:41 2022-01-09 14:27:41    13   0
1.19   1 2022-01-01 02:27:41 2022-01-10 02:27:41    11   0
1.20   1 2022-01-01 02:27:41 2022-01-10 14:27:41    11   0
1.21   1 2022-01-01 02:27:41 2022-01-11 02:27:41    10   0
1.22   1 2022-01-01 02:27:41 2022-01-11 14:27:41    11   0
2.26   2 2022-12-04 17:21:57 2022-12-01 17:21:57    11  -3
2.27   2 2022-12-04 17:21:57 2022-12-02 05:21:57    11  -3
2.28   2 2022-12-04 17:21:57 2022-12-02 17:21:57    11  -3
2.29   2 2022-12-04 17:21:57 2022-12-03 05:21:57    11  -3
2.30   2 2022-12-04 17:21:57 2022-12-03 17:21:57    10  -3
2.31   2 2022-12-04 17:21:57 2022-12-04 05:21:57    12  -3
2.32   2 2022-12-04 17:21:57 2022-12-04 17:21:57    11  -3
2.33   2 2022-12-04 17:21:57 2022-12-05 05:21:57    12  -3
2.34   2 2022-12-04 17:21:57 2022-12-05 17:21:57    12  -3
2.35   2 2022-12-04 17:21:57 2022-12-06 05:21:57    13  -3
2.36   2 2022-12-04 17:21:57 2022-12-06 17:21:57    12  -3
2.37   2 2022-12-04 17:21:57 2022-12-07 05:21:57    10  -3
2.38   2 2022-12-04 17:21:57 2022-12-07 17:21:57    10  -3
2.39   2 2022-12-04 17:21:57 2022-12-08 05:21:57    12  -3
2.40   2 2022-12-04 17:21:57 2022-12-08 17:21:57    12  -3
2.41   2 2022-12-04 17:21:57 2022-12-09 05:21:57    13  -3
2.42   2 2022-12-04 17:21:57 2022-12-09 17:21:57    12  -3
2.43   2 2022-12-04 17:21:57 2022-12-10 05:21:57    10  -3
2.44   2 2022-12-04 17:21:57 2022-12-10 17:21:57    10  -3
2.45   2 2022-12-04 17:21:57 2022-12-11 05:21:57    10  -3
2.46   2 2022-12-04 17:21:57 2022-12-11 17:21:57    12  -3
2.47   2 2022-12-04 17:21:57 2022-12-12 05:21:57    10  -3
2.48   2 2022-12-04 17:21:57 2022-12-12 17:21:57    10  -3
2.49   2 2022-12-04 17:21:57 2022-12-13 05:21:57     9  -3
2.50   2 2022-12-04 17:21:57 2022-12-13 17:21:57     9  -3
3.51   3 2022-01-31 11:13:21 2022-01-28 11:13:21    12  -3
3.52   3 2022-01-31 11:13:21 2022-01-28 23:13:21    13  -3
3.53   3 2022-01-31 11:13:21 2022-01-29 11:13:21    11  -3
3.54   3 2022-01-31 11:13:21 2022-01-29 23:13:21    12  -3
3.55   3 2022-01-31 11:13:21 2022-01-30 11:13:21    11  -3
3.56   3 2022-01-31 11:13:21 2022-01-30 23:13:21    10  -3
3.57   3 2022-01-31 11:13:21 2022-01-31 11:13:21    11  -3
3.58   3 2022-01-31 11:13:21 2022-01-31 23:13:21    13  -3
3.59   3 2022-01-31 11:13:21 2022-02-01 11:13:21    11  -3
3.60   3 2022-01-31 11:13:21 2022-02-01 23:13:21    10  -3
3.61   3 2022-01-31 11:13:21 2022-02-02 11:13:21    10  -3
3.62   3 2022-01-31 11:13:21 2022-02-02 23:13:21    11  -3
3.63   3 2022-01-31 11:13:21 2022-02-03 11:13:21    12  -3
3.64   3 2022-01-31 11:13:21 2022-02-03 23:13:21    10  -3
3.65   3 2022-01-31 11:13:21 2022-02-04 11:13:21     9  -3
3.66   3 2022-01-31 11:13:21 2022-02-04 23:13:21    11  -3
3.67   3 2022-01-31 11:13:21 2022-02-05 11:13:21    12  -3
3.68   3 2022-01-31 11:13:21 2022-02-05 23:13:21    11  -3
3.69   3 2022-01-31 11:13:21 2022-02-06 11:13:21    10  -3
3.70   3 2022-01-31 11:13:21 2022-02-06 23:13:21    11  -3
3.71   3 2022-01-31 11:13:21 2022-02-07 11:13:21    13  -3
3.72   3 2022-01-31 11:13:21 2022-02-07 23:13:21    11  -3
3.73   3 2022-01-31 11:13:21 2022-02-08 11:13:21    10  -3
3.74   3 2022-01-31 11:13:21 2022-02-08 23:13:21    11  -3
3.75   3 2022-01-31 11:13:21 2022-02-09 11:13:21    13  -3
4.76   4 2022-03-01 01:40:46 2022-02-17 01:40:46    10 -12
4.77   4 2022-03-01 01:40:46 2022-02-17 13:40:46     9 -12
4.78   4 2022-03-01 01:40:46 2022-02-18 01:40:46    10 -12
4.79   4 2022-03-01 01:40:46 2022-02-18 13:40:46    10 -12
4.80   4 2022-03-01 01:40:46 2022-02-19 01:40:46    10 -12
4.81   4 2022-03-01 01:40:46 2022-02-19 13:40:46    10 -12
5.101  5 2022-07-23 18:27:11 2022-07-11 18:27:11    12 -12
5.102  5 2022-07-23 18:27:11 2022-07-12 06:27:11     9 -12
5.103  5 2022-07-23 18:27:11 2022-07-12 18:27:11    11 -12
5.104  5 2022-07-23 18:27:11 2022-07-13 06:27:11    11 -12
5.105  5 2022-07-23 18:27:11 2022-07-13 18:27:11    11 -12

Update

If you want to group by both id and visit_Date, you are looking for the interaction.

> by(dat, with(dat, interaction(id, visit_Date)), \(x) {
+   if (any(d10 <- difftime(x$visit_Date, as.Date(x$time), unit='days') > 10)) {
+     x[d10, ]
+   } else {
+     x[difftime(x$visit_Date, as.Date(x$time), unit='days') > -10, ]
+   }
+ }) |> do.call(what='rbind')

Data:

Simulated data for a random date in 2022, added 12 days and randomly subtracted 3, 0 or 12 days. left the u column in example data for orientation. Instead of 5 min intervals I used 12 hours for sake of brevity.

set.seed(42)
dat <- lapply(seq_len(5), \(x) {
  visit_Date <- as.POSIXct(sample.int(31535700, 1), origin='2022-01-01')
  rnd <- runif(1L)
  if (rnd > .66) {
    u <- -3
  } else if (rnd > .33) {
    u <- 0
  } else {
    u <- -12
  }
  time <- seq.POSIXt(visit_Date, visit_Date + 12*60*60*24, by='12 hour') +
    u*60*60*24
  list(id=x,
       visit_Date=visit_Date, 
       time=time,
       value=round(runif(length(time), 9, 13)),
       u=u)
}) |> lapply(data.frame) |> do.call(what='rbind')

Gives

> by(dat, ~id, tail, 3)
id: 1
   id          visit_Date                time value   u
23  1 2022-02-07 03:52:21 2022-02-06 03:52:21    11 -12
24  1 2022-02-07 03:52:21 2022-02-06 15:52:21    11 -12
25  1 2022-02-07 03:52:21 2022-02-07 03:52:21    13 -12
--------------------------------------------------------------------------------------------------- 
id: 2
   id          visit_Date                time value  u
48  2 2022-03-24 14:17:57 2022-04-01 15:17:57    12 -3
49  2 2022-03-24 14:17:57 2022-04-02 03:17:57     9 -3
50  2 2022-03-24 14:17:57 2022-04-02 15:17:57    12 -3
--------------------------------------------------------------------------------------------------- 
id: 3
   id          visit_Date                time value   u
73  3 2022-09-26 17:29:28 2022-09-25 17:29:28    10 -12
74  3 2022-09-26 17:29:28 2022-09-26 05:29:28    10 -12
75  3 2022-09-26 17:29:28 2022-09-26 17:29:28    12 -12
--------------------------------------------------------------------------------------------------- 
id: 4
    id          visit_Date                time value   u
98   4 2022-04-29 20:09:16 2022-04-28 20:09:16     9 -12
99   4 2022-04-29 20:09:16 2022-04-29 08:09:16    11 -12
100  4 2022-04-29 20:09:16 2022-04-29 20:09:16    12 -12
--------------------------------------------------------------------------------------------------- 
id: 5
    id          visit_Date                time value u
123  5 2022-03-17 11:28:55 2022-03-28 12:28:55     9 0
124  5 2022-03-17 11:28:55 2022-03-29 00:28:55    12 0
125  5 2022-03-17 11:28:55 2022-03-29 12:28:55    11 0
like image 75
jay.sf Avatar answered Oct 18 '25 10:10

jay.sf