I've got a dataframe with a column containing peptide sequences and I want to keep only rows that have no internal "R" or "K" in their string.
df1 <- data.frame(
Peptide = c("ABCOIIJUHFSAUJHR", "AOFIAUKOAISDFUK", 'ASOIRDFHAOHFKK'))
df1 #check output
As output I would like to keep only the first row (i.e. "ABCOIIJUHFSAUJHR").
I have tried using filter (dplyr) and str_locate_all from the stringr package and length but couldn't figure it out.
Any help would be much appreciated.
Thanks Moe
We can skip with the first and last character (^.
, .$
) and match zero or more characters that are not an R or K ([^RK]*
) in grep
and use that to subset the dataset
df1[grepl("^.[^RK]*.$", df1$Peptide), , drop = FALSE]
# Peptide
#1 ABCOIIJUHFSAUJHR
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With