Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R: Create dummy if column includes duplicate given group

Tags:

dataframe

r

I would like to create a dummy variable that takes the value 1 if an individual is observed in two or more different age groups and 0 otherwise.

Is someone able to do that and could explain it to me?

A small example could be:

set.seed(123)
df <- data.frame(id = sample(1:10, 30, replace = TRUE),
             agegroup = sample(c("5054", "5559", "6065"), 30, replace = TRUE))

And expected output:

id  agegroup    dummy
 3     6065       1
 8     6065       1
 5     6065       1
 9     6065       1
10     5054       1
 1     5559       0
 6     6065       1
 9     5054       1
 6     5054       1
 5     5054       1
10     5054       1
 5     5559       1
 7     5559       1
 6     5559       1
 2     5054       1
 9     5054       1
 3     5054       1
 1     5559       0
 4     5054       0
10     6065       1
 9     5054       1
 7     5559       1
 7     6065       1
10     5054       1
 7     5559       1
 8     5054       1
 6     5054       1
 6     6065       1
 3     6065       1
 2     5559       1
like image 799
maaas Avatar asked Jun 14 '18 20:06

maaas


1 Answers

An option is to use dplyr::group_by(id) and count unique agegroup. Your data contains duplicate rows for id and agegroup combination.

Edit: Updated with comments from @Henrik

library(dplyr)

df %>% group_by(id) %>%
  mutate(dummy = as.integer(n_distinct(agegroup) > 1))    

# # A tibble: 30 x 3
# # Groups: id [10]
#      id agegroup dummy
#   <int> <fctr>   <int>
# 1     3 6065         1
# 2     8 6065         1
# 3     5 6065         1
# 4     9 6065         1
# 5    10 5054         1
# 6     1 5559         0
# 7     6 6065         1
# 8     9 5054         1
# 9     6 5054         1
# 10     5 5054         1
# # ... with 20 more rows
like image 195
MKR Avatar answered Sep 22 '22 16:09

MKR