Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Find first character that is different between two strings

Tags:

string

php

Given two equal-length strings, is there an elegant way to get the offset of the first different character?

The obvious solution would be:

for ($offset = 0; $offset < $length; ++$offset) {     if ($str1[$offset] !== $str2[$offset]) {         return $offset;     } } 

But that doesn't look quite right, for such a simple task.

like image 446
NikiC Avatar asked Sep 19 '11 18:09

NikiC


People also ask

How do you find the difference between two strings?

To find the difference between 2 Strings you can use the StringUtils class and the difference method. It compares the two Strings, and returns the portion where they differ.

How do you compare characters in two strings?

You can compare two Strings in Java using the compareTo() method, equals() method or == operator. The compareTo() method compares two strings. The comparison is based on the Unicode value of each character in the strings.

How do you find the difference between two strings in python?

Use the == and != operators to compare two strings for equality. Use the is operator to check if two strings are the same instance. Use the < , > , <= , and >= operators to compare strings alphabetically.


1 Answers

You can use a nice property of bitwise XOR (^) to achieve this: Basically, when you xor two strings together, the characters that are the same will become null bytes ("\0"). So if we xor the two strings, we just need to find the position of the first non-null byte using strspn:

$position = strspn($string1 ^ $string2, "\0"); 

That's all there is to it. So let's look at an example:

$string1 = 'foobarbaz'; $string2 = 'foobarbiz'; $pos = strspn($string1 ^ $string2, "\0");  printf(     'First difference at position %d: "%s" vs "%s"',     $pos, $string1[$pos], $string2[$pos] ); 

That will output:

First difference at position 7: "a" vs "i"

So that should do it. It's very efficient since it's only using C functions, and requires only a single copy of memory of the string.

Edit: A MultiByte Solution Along The Same Lines:

function getCharacterOffsetOfDifference($str1, $str2, $encoding = 'UTF-8') {     return mb_strlen(         mb_strcut(             $str1,             0, strspn($str1 ^ $str2, "\0"),             $encoding         ),         $encoding     ); } 

First the difference at the byte level is found using the above method and then the offset is mapped to the character level. This is done using the mb_strcut function, which is basically substr but honoring multibyte character boundaries.

var_dump(getCharacterOffsetOfDifference('foo', 'foa')); // 2 var_dump(getCharacterOffsetOfDifference('©oo', 'foa')); // 0 var_dump(getCharacterOffsetOfDifference('f©o', 'fªa')); // 1 

It's not as elegant as the first solution, but it's still a one-liner (and if you use the default encoding a little bit simpler):

return mb_strlen(mb_strcut($str1, 0, strspn($str1 ^ $str2, "\0"))); 
like image 132
ircmaxell Avatar answered Oct 01 '22 05:10

ircmaxell