I have a datafile with one row per participants (named 1-x, based on the study they took part in). I want to check whether all participants are present in the dataset. This is my toy dataset, personid are the participants, study is the study they took part in.
df <- read.table(text = "personid study measurement
1 x 23
2 x 32
1 y 21
3 y 23
4 y 23
6 y 23", header=TRUE)
which looks like this:
personid study measurement
1 1 x 23
2 2 x 32
3 1 y 21
4 3 y 23
5 4 y 23
6 6 y 23
so for y, I am missing participants 2 and 5. How do I check that automatically? I tried adding a counter variable and comparing that counter variable to the participant id but once one participant is missing, the comparison is meaningless because the alignment is off.
df %>% group_by(study) %>% mutate(id = 1:n(),check = id==personid)
Source: local data frame [6 x 5]
Groups: date [2]
personid study measurement id check
<int> <fctr> <int> <int> <lgl>
1 1 x 23 1 TRUE
2 2 x 32 2 TRUE
3 1 y 21 1 TRUE
4 3 y 23 2 FALSE
5 4 y 23 3 FALSE
6 6 y 23 4 FALSE
Assuming your personid
is sequential, then you can do this using setdiff
, i.e.
library(dplyr)
df %>%
group_by(study) %>%
mutate(new = toString(setdiff(max(personid):min(personid), personid)))
#Source: local data frame [6 x 4]
#Groups: study [2]
# personid study measurement new
# <int> <fctr> <int> <chr>
#1 1 x 23
#2 2 x 32
#3 1 y 21 5, 2
#4 3 y 23 5, 2
#5 4 y 23 5, 2
#6 6 y 23 5, 2
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With