Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Check string pattern for non-unique characters

Tags:

string

r

strsplit

I've a data frame with two columns: id and gradelist.

The value in gradelist column includes a list of grades (separated by ;) with different length.

Here's the data:

id <- seq(1,7)
gradelist <- c("a;b;b",
            "c;c",
            "d;d;d;f",
            "f;f;f;f;f;f",
            "a;a;a;a",
            "f;b;b;b;b;b;b;b",
            "c;c;d;d;a;a")

df <- data.frame(id, gradelist)
df$gradelist <- as.character(df$gradelist)

I need to add another cloumn to chech whether all grades are the smae for each id.

The output would look like:

enter image description here

like image 302
user9292 Avatar asked Dec 14 '22 09:12

user9292


2 Answers

We can extract the characters and check with n_distinct to find the number of distinct elements is 1

library(dplyr)
library(purrr)
df %>% 
   mutate(same = map_chr(str_extract_all(gradelist, "[a-z]"), 
       ~ c("no", "yes")[1+(n_distinct(.x)==1)]))
#   id       gradelist same
#1  1           a;b;b   no
#2  2             c;c  yes
#3  3         d;d;d;f   no
#4  4     f;f;f;f;f;f  yes
#5  5         a;a;a;a  yes
#6  6 f;b;b;b;b;b;b;b   no
#7  7     c;c;d;d;a;a   no

Or make use of case_when

df %>% 
   mutate(same = map_chr(str_extract_all(gradelist, "[a-z]"), ~
         case_when(n_distinct(.x) == 1 ~ "yes", TRUE ~ "no")))

Or another option is separate_rows on the 'gradelist' to expand the data, find the n_distinct

library(tidyr)
df %>% 
    separate_rows(gradelist) %>%
    distinct %>% 
    group_by(id) %>% 
    summarise(same = c("no", "yes")[1 + (n_distinct(gradelist) == 1)]) %>% 
    left_join(df)
like image 64
akrun Avatar answered Jan 04 '23 12:01

akrun


Check which character is in first place and replace all occurrences of that character with empty string. If nothing's left, that means all characters are same.

sapply(df$gradelist, function(x) {
    nchar(gsub(paste0(substring(x, 1, 1), "|;"), "", x)) == 0
}, USE.NAMES = FALSE)
#[1] FALSE  TRUE FALSE  TRUE  TRUE FALSE FALSE
like image 33
d.b Avatar answered Jan 04 '23 10:01

d.b