I'm trying to use R to find the start difference of two strings, i.e. from which letter these two strings become different, and hope the function can give me the location number. The function always give the value 2, and seems the loop only runs one time.
Here is my code:
string1 = "CGCGGTGCATCCTGGGAGTTGTAGTTTTTTCTACTCAGAGGGAGAATAGCTCCAGACGGGAGCAGGATGA"
string2 = "CGCGGTGCATCCTGGGATGTAGTTTTTTCTACTCAGAGGGAGAATAGCTCCAGACGGGAGCAGGATGA"
location <- function(string1, string2){
len1 = nchar(string1)
len2 = nchar(string2)
len = max(len1, len2)
score = 1
i = 1
if (i <= len){
if (substring(string1, i, i) == substring(string2, i, i)){
score = score + 1
i = i + 1
}
else if (substring(string1, i, i) != substring(string2, i, i)){
break
}
}
return(score)
}
location(string1, string2)
Thank you very much!
To find the difference between 2 Strings you can use the StringUtils class and the difference method. It compares the two Strings, and returns the portion where they differ.
For example, you can check whether two objects are equal (equality) by using a double equals sign == . We can see if the logical value of TRUE equals the logical value of TRUE by using this query TRUE == TRUE .
We can split the string and compare character by character and get the first mismatch using which.min
which.min(strsplit(string1, "")[[1]] == strsplit(string2, "")[[1]])
#[1] 18
The above method returns a warning message when nchar(string1)
is not equal to nchar(string2)
Warning message: In strsplit(string1, "")[[1]] == strsplit(string2, "")[[1]] : longer object length is not a multiple of shorter object length
Most of the cases it would be fine to ignore this message, it would still give you correct answer.
However, to make it complete and reliable we can write a function
location <- function(string1, string2) {
n = pmin(nchar(string1), nchar(string2))
i = 1
while (i <= n) {
if (substr(string1, i, i) != substr(string2, i, i))
return(i)
i = i + 1
}
cat("There is no difference between two strings")
}
location(string1, string2)
#[1] 18
location("Ronak", "Shah")
#[1] 1
location("Ronak", "Ronak")
#There is no difference between two strings
Base function abbreviate
can give the solution, since, with its defaults, it tries to find the first char making the difference between strings in order to make unique abbreviations :
nchar(abbreviate(c(string1,string2),minlength=1)[1])
#CGCGGTGCATCCTGGGAGTTGTAGTTTTTTCTACTCAGAGGGAGAATAGCTCCAGACGGGAGCAGGATGA
# 18
nchar(abbreviate(c("ABCDE","DEFGH"),minlength=1)[1])
#ABCDE
# 1
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With