I've a data frame with two columns: id
and gradelist
.
The value in gradelist
column includes a list of grades (separated by ;
) with different length.
Here's the data:
id <- seq(1,7)
gradelist <- c("a;b;b",
"c;c",
"d;d;d;f",
"f;f;f;f;f;f",
"a;a;a;a",
"f;b;b;b;b;b;b;b",
"c;c;d;d;a;a")
df <- data.frame(id, gradelist)
df$gradelist <- as.character(df$gradelist)
I need to add another cloumn to chech whether all grades are the smae for each id.
The output would look like:
We can extract the characters and check with n_distinct
to find the number of distinct elements is 1
library(dplyr)
library(purrr)
df %>%
mutate(same = map_chr(str_extract_all(gradelist, "[a-z]"),
~ c("no", "yes")[1+(n_distinct(.x)==1)]))
# id gradelist same
#1 1 a;b;b no
#2 2 c;c yes
#3 3 d;d;d;f no
#4 4 f;f;f;f;f;f yes
#5 5 a;a;a;a yes
#6 6 f;b;b;b;b;b;b;b no
#7 7 c;c;d;d;a;a no
Or make use of case_when
df %>%
mutate(same = map_chr(str_extract_all(gradelist, "[a-z]"), ~
case_when(n_distinct(.x) == 1 ~ "yes", TRUE ~ "no")))
Or another option is separate_rows
on the 'gradelist' to expand the data, find the n_distinct
library(tidyr)
df %>%
separate_rows(gradelist) %>%
distinct %>%
group_by(id) %>%
summarise(same = c("no", "yes")[1 + (n_distinct(gradelist) == 1)]) %>%
left_join(df)
Check which character is in first place and replace all occurrences of that character with empty string. If nothing's left, that means all characters are same.
sapply(df$gradelist, function(x) {
nchar(gsub(paste0(substring(x, 1, 1), "|;"), "", x)) == 0
}, USE.NAMES = FALSE)
#[1] FALSE TRUE FALSE TRUE TRUE FALSE FALSE
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With