Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I compare two strings to find the number of characters that match in R, using substitution distance?

In R, I have two character vectors, a and b.

a <- c("abcdefg", "hijklmnop", "qrstuvwxyz")
b <- c("abXdeXg", "hiXklXnoX", "Xrstuvwxyz")

I want a function that counts the character mismatches between each element of a and the corresponding element of b. Using the example above, such a function should return c(2,3,1). There is no need to align the strings. I need to compare each pair of strings character-by-character and count matches and/or mismatches in each pair. Does any such function exist in R?

Or, to ask the question in another way, is there a function to give me the edit distance between two strings, where the only allowed operation is substitution (ignore insertions or deletions)?

like image 667
Ryan C. Thompson Avatar asked Jun 24 '13 22:06

Ryan C. Thompson


2 Answers

Using some mapply fun:

mapply(function(x,y) sum(x!=y),strsplit(a,""),strsplit(b,""))
#[1] 2 3 1
like image 74
thelatemail Avatar answered Oct 21 '22 06:10

thelatemail


Another option is to use adist which Compute the approximate string distance between character vectors:

mapply(adist,a,b)
abcdefg  hijklmnop qrstuvwxyz 
     2          3          1 
like image 34
agstudy Avatar answered Oct 21 '22 07:10

agstudy