Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using R to find the start difference of two strings

Tags:

string

r

compare

I'm trying to use R to find the start difference of two strings, i.e. from which letter these two strings become different, and hope the function can give me the location number. The function always give the value 2, and seems the loop only runs one time.

Here is my code:

string1 = "CGCGGTGCATCCTGGGAGTTGTAGTTTTTTCTACTCAGAGGGAGAATAGCTCCAGACGGGAGCAGGATGA"
string2 = "CGCGGTGCATCCTGGGATGTAGTTTTTTCTACTCAGAGGGAGAATAGCTCCAGACGGGAGCAGGATGA"

location <- function(string1, string2){
  len1 = nchar(string1)
  len2 = nchar(string2)
  len = max(len1, len2)
  score = 1
  i = 1
  if (i <= len){
     if (substring(string1, i, i) == substring(string2, i, i)){
     score = score + 1
     i = i + 1
   }
  else if (substring(string1, i, i) != substring(string2, i, i)){
  break
   }
 }
  return(score)
}

location(string1, string2)

Thank you very much!

like image 825
Chengjianning Avatar asked Sep 17 '18 02:09

Chengjianning


People also ask

How do you find the difference between two strings?

To find the difference between 2 Strings you can use the StringUtils class and the difference method. It compares the two Strings, and returns the portion where they differ.

How do you check if two strings are the same in R?

For example, you can check whether two objects are equal (equality) by using a double equals sign == . We can see if the logical value of TRUE equals the logical value of TRUE by using this query TRUE == TRUE .


2 Answers

We can split the string and compare character by character and get the first mismatch using which.min

which.min(strsplit(string1, "")[[1]] == strsplit(string2, "")[[1]])
#[1] 18

The above method returns a warning message when nchar(string1) is not equal to nchar(string2)

Warning message: In strsplit(string1, "")[[1]] == strsplit(string2, "")[[1]] : longer object length is not a multiple of shorter object length

Most of the cases it would be fine to ignore this message, it would still give you correct answer.

However, to make it complete and reliable we can write a function

location <- function(string1, string2) {
  n = pmin(nchar(string1), nchar(string2))
  i = 1
  while (i <= n) {
    if (substr(string1, i, i) != substr(string2, i, i)) 
       return(i)
    i = i + 1
  }
 cat("There is no difference between two strings")
}

location(string1, string2)
#[1] 18

location("Ronak", "Shah")
#[1] 1

location("Ronak", "Ronak")
#There is no difference between two strings
like image 108
Ronak Shah Avatar answered Oct 19 '22 15:10

Ronak Shah


Base function abbreviatecan give the solution, since, with its defaults, it tries to find the first char making the difference between strings in order to make unique abbreviations :

nchar(abbreviate(c(string1,string2),minlength=1)[1])
#CGCGGTGCATCCTGGGAGTTGTAGTTTTTTCTACTCAGAGGGAGAATAGCTCCAGACGGGAGCAGGATGA 
#                                                                    18

nchar(abbreviate(c("ABCDE","DEFGH"),minlength=1)[1])
#ABCDE 
#    1
like image 34
Nicolas2 Avatar answered Oct 19 '22 16:10

Nicolas2