I want to replace duplicated elements within a group
df <- data.frame(A=c("a", "a", "a", "b", "b", "c"), group = c(1, 1, 2, 2, 2, 3))
I want to keep the first element of the group, while replacing anything else with NA. Something like:
df <- df %>%
group_by(group) %>%
mutate(B = first(A))
Which doesn't produce what I want. What I want instead is B <- c(a, NA, a, NA, NA, c)
Use replace with duplicated:
df %>% group_by(group) %>% mutate(B = replace(A, duplicated(A), NA))
# A tibble: 6 x 2
# Groups: group [3]
# A group
# <fctr> <dbl>
#1 a 1
#2 NA 1
#3 a 2
#4 b 2
#5 NA 2
#6 c 3
Or if keep only the first element:
df %>%
group_by(group) %>%
mutate(B = ifelse(row_number() == 1, as.character(A), NA))
# A tibble: 6 x 2
# Groups: group [3]
# A group
# <chr> <dbl>
#1 a 1
#2 <NA> 1
#3 a 2
#4 <NA> 2
#5 <NA> 2
#6 c 3
OR use replace:
df %>%
group_by(group) %>%
mutate(B = replace(A, row_number() > 1, NA))
# A tibble: 6 x 2
# Groups: group [3]
# A group
# <fctr> <dbl>
#1 a 1
#2 NA 1
#3 a 2
#4 NA 2
#5 NA 2
#6 c 3
In data.table you could do:
library(data.table)
setDT(df)[, B := c(A[1], rep(NA, .N - 1)), by = group]
Or same logic in dplyr:
library(dplyr)
df %>% group_by(group) %>% mutate(B = c(as.character(A[1]), rep(NA, n() - 1)))
# A tibble: 6 x 3
# Groups: group [3]
# A group B
# <fctr> <dbl> <chr>
#1 a 1 a
#2 a 1 <NA>
#3 a 2 a
#4 b 2 <NA>
#5 b 2 <NA>
#6 c 3 c
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With