I really hope someone can help me with this question, because I've been struggling for some time. My data looks like this:
ID DATE VAR1 VAR2
01 2018-07-27 0 0
01 2018-07-28 0 0
01 2018-07-29 0 1
01 2018-07-30 0 1
01 2018-07-31 0 1
01 2018-08-01 0 0
02 2018-09-30 1 0
02 2018-10-01 0 0
02 2018-10-02 0 1
02 2018-10-03 1 1
02 2018-10-04 1 1
02 2018-10-05 0 1
02 2018-10-06 0 0
02 2018-10-07 0 0
02 2018-10-08 0 0
02 2018-10-10 0 0
02 2018-10-12 0 0
02 2018-10-13 0 0
02 2018-10-14 0 0
02 2018-10-15 1 0
02 2018-10-18 1 0
02 2018-10-19 0 0
02 2018-10-20 0 0
02 2018-10-26 0 0
02 2018-10-28 0 0
02 2018-11-02 0 1
I want to know for each ID if VAR1 was present or not on the first day VAR 2 was present +/- 2 days. I would like to store the answers in a new dataframe, like this:
ID PRESENT
01 0
02 1
Does someone know how to do this? VAR2 is the menstrual cycle. For some ID's I have data of multiple menstruations. If VAR1 was present on the first day +/- 2 days in one of the menstruations, I want them to come out positive.
Thanks in advance!
One way of going about it, but there should be a better hack:
library(dplyr)
df %>%
group_by(ID) %>%
mutate(
DATE = as.Date(DATE),
VAR2 = ifelse(VAR2 == 1 & lag(VAR2) == 1, 0, VAR2),
PRESENT = sapply(DATE,
function(x) any(VAR1[between(DATE, x - 2, x + 2)] == 1)) & VAR2 == 1
) %>%
summarise(PRESENT = +any(PRESENT))
Output:
# A tibble: 2 x 2
ID PRESENT
<int> <int>
1 1 0
2 2 1
Data used:
df <- structure(list(ID = c(1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L
), DATE = structure(1:26, .Label = c("2018-07-27", "2018-07-28",
"2018-07-29", "2018-07-30", "2018-07-31", "2018-08-01", "2018-09-30",
"2018-10-01", "2018-10-02", "2018-10-03", "2018-10-04", "2018-10-05",
"2018-10-06", "2018-10-07", "2018-10-08", "2018-10-10", "2018-10-12",
"2018-10-13", "2018-10-14", "2018-10-15", "2018-10-18", "2018-10-19",
"2018-10-20", "2018-10-26", "2018-10-28", "2018-11-02"), class = "factor"),
VAR1 = c(0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 1L, 1L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 0L, 0L, 0L, 0L, 0L),
VAR2 = c(0L, 0L, 1L, 1L, 1L, 0L, 0L, 0L, 1L, 1L, 1L, 1L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L)), class = "data.frame", row.names = c(NA,
-26L))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With