How to delete characters in a string according to a second string?

Question

Consider these two strings:

string1 <- "GCTCCC...CTCCATGAAGTA...CTTCACATCCGTGT.CCGGCCTGGCCGCGGAGAGCCC"
string_reference <- "GCTCCC...CTCCATGAAGTATTTCTTCACATCCGTGT.CCGGCCTGGCCGCGGAGAGCCC"

How do I easily remove the dots in "string1", but only those dots that are in the same position in "string_reference"?

Expected output:

string1 = "GCTCCCCTCCATGAAGTA...CTTCACATCCGTGTCCGGCCTGGCCGCGGAGAGCCC"

Simon O'Hanlon · Accepted Answer

I'd just use R's truly vectorised subsetting and logical comparison methods...

# Split the strings
x <- strsplit( c( string1 , string_reference ) , "" )
# Compare and remove dots from string1 when dots also appear in the reference string at the same position
paste( x[[1]][ ! (x[[2]]== "." & x[[1]] == ".") ] , collapse = "" )
#[1] "GCTCCCCTCCATGAAGTA...CTTCACATCCGTGTCCGGCCTGGCCGCGGAGAGCCC"

BrodieG · Answer

Similar to Robert's, but the "vectorized" version:

s1 <- unlist(strsplit(string1, ""))
s2 <- unlist(strsplit(string_reference, ""))
paste0(Filter(Negate(is.na), ifelse(s1 == s2 & s1 == ".", NA, s1)), collapse="")
# [1] "GCTCCCCTCCATGAAGTA...CTTCACATCCGTGTCCGGCCTGGCCGCGGAGAGCCC"

I quote "vectorized" because the vectorization is happening on the characters of your string vectors. This assumes there is only one element in your string vectors. If you had multiple elements in your string vectors you would have to loop through the results of strsplit.

How to delete characters in a string according to a second string?

Tags:

r

vitor

2 Answers

Simon O'Hanlon

BrodieG

Recent Activity

Donate For Us

How to delete characters in a string according to a second string?

Tags:

r

vitor

2 Answers

Simon O'Hanlon

BrodieG

Related questions

Recent Activity

Donate For Us