Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

filter/subset/delete rows that contain character in middle of string in R

Tags:

r

I've got a dataframe with a column containing peptide sequences and I want to keep only rows that have no internal "R" or "K" in their string.

df1 <- data.frame(
    Peptide = c("ABCOIIJUHFSAUJHR", "AOFIAUKOAISDFUK", 'ASOIRDFHAOHFKK'))


df1 #check output

As output I would like to keep only the first row (i.e. "ABCOIIJUHFSAUJHR").

I have tried using filter (dplyr) and str_locate_all from the stringr package and length but couldn't figure it out.

Any help would be much appreciated.

Thanks Moe

like image 314
Moe Avatar asked Jan 29 '23 06:01

Moe


1 Answers

We can skip with the first and last character (^., .$) and match zero or more characters that are not an R or K ([^RK]*) in grep and use that to subset the dataset

df1[grepl("^.[^RK]*.$", df1$Peptide), , drop = FALSE]
#           Peptide
#1 ABCOIIJUHFSAUJHR
like image 72
akrun Avatar answered Jan 30 '23 19:01

akrun