I would like to create a dummy variable that takes the value 1 if an individual is observed in two or more different age groups and 0 otherwise.
Is someone able to do that and could explain it to me?
A small example could be:
set.seed(123)
df <- data.frame(id = sample(1:10, 30, replace = TRUE),
agegroup = sample(c("5054", "5559", "6065"), 30, replace = TRUE))
And expected output:
id agegroup dummy
3 6065 1
8 6065 1
5 6065 1
9 6065 1
10 5054 1
1 5559 0
6 6065 1
9 5054 1
6 5054 1
5 5054 1
10 5054 1
5 5559 1
7 5559 1
6 5559 1
2 5054 1
9 5054 1
3 5054 1
1 5559 0
4 5054 0
10 6065 1
9 5054 1
7 5559 1
7 6065 1
10 5054 1
7 5559 1
8 5054 1
6 5054 1
6 6065 1
3 6065 1
2 5559 1
An option is to use dplyr::group_by(id)
and count unique
agegroup. Your data contains duplicate rows for id
and agegroup
combination.
Edit: Updated with comments from @Henrik
library(dplyr)
df %>% group_by(id) %>%
mutate(dummy = as.integer(n_distinct(agegroup) > 1))
# # A tibble: 30 x 3
# # Groups: id [10]
# id agegroup dummy
# <int> <fctr> <int>
# 1 3 6065 1
# 2 8 6065 1
# 3 5 6065 1
# 4 9 6065 1
# 5 10 5054 1
# 6 1 5559 0
# 7 6 6065 1
# 8 9 5054 1
# 9 6 5054 1
# 10 5 5054 1
# # ... with 20 more rows
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With