Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Comparing two unicode strings in PHP

I am stuck in comparing two unicode strings in PHP which both contain the special char 'ö'. One string comes from $_GET, the other one is a filesystem's folder name (scandir()). Both strings seem to be equal to me, making a

var_dump($filter);
var_dump($tail . '/' . $k);

on them also shows their equality but with different string lenghts (?!):

string '/blöb' (length=7)
string '/blöb' (length=6)

My snippet comparing them looks as follows:

if($filter == ($tail . '/' . $k)) {
    /* ... */
}

What's going on here?

Additional information: $tail is an empty string:

string '' (length=0)
like image 783
proximus Avatar asked May 19 '26 12:05

proximus


1 Answers

See here: http://en.wikipedia.org/wiki/Unicode_equivalence and use this: http://www.php.net/manual/en/class.normalizer.php

You probably have a decomposed character in the longer string, meaning an o and then a umlaut combining character which overlays the previous character.

The normalizer function will fix things like that.

As a side note you should always normalize your input if you are using it for equivalence (for example a username - you want to make sure two people don't choose the same username, even if the binary representation of the string happens to be different).

like image 191
Ariel Avatar answered May 22 '26 02:05

Ariel