Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to delete characters in a string according to a second string?

Tags:

r

Consider these two strings:

string1 <- "GCTCCC...CTCCATGAAGTA...CTTCACATCCGTGT.CCGGCCTGGCCGCGGAGAGCCC"
string_reference <- "GCTCCC...CTCCATGAAGTATTTCTTCACATCCGTGT.CCGGCCTGGCCGCGGAGAGCCC"

How do I easily remove the dots in "string1", but only those dots that are in the same position in "string_reference"?

Expected output:

string1 = "GCTCCCCTCCATGAAGTA...CTTCACATCCGTGTCCGGCCTGGCCGCGGAGAGCCC"
like image 690
vitor Avatar asked Mar 24 '14 22:03

vitor


2 Answers

I'd just use R's truly vectorised subsetting and logical comparison methods...

# Split the strings
x <- strsplit( c( string1 , string_reference ) , "" )
# Compare and remove dots from string1 when dots also appear in the reference string at the same position
paste( x[[1]][ ! (x[[2]]== "." & x[[1]] == ".") ] , collapse = "" )
#[1] "GCTCCCCTCCATGAAGTA...CTTCACATCCGTGTCCGGCCTGGCCGCGGAGAGCCC"
like image 144
Simon O'Hanlon Avatar answered Sep 21 '22 04:09

Simon O'Hanlon


Similar to Robert's, but the "vectorized" version:

s1 <- unlist(strsplit(string1, ""))
s2 <- unlist(strsplit(string_reference, ""))
paste0(Filter(Negate(is.na), ifelse(s1 == s2 & s1 == ".", NA, s1)), collapse="")
# [1] "GCTCCCCTCCATGAAGTA...CTTCACATCCGTGTCCGGCCTGGCCGCGGAGAGCCC"

I quote "vectorized" because the vectorization is happening on the characters of your string vectors. This assumes there is only one element in your string vectors. If you had multiple elements in your string vectors you would have to loop through the results of strsplit.

like image 33
BrodieG Avatar answered Sep 22 '22 04:09

BrodieG