I'm trying to compare two string lets say Émilie and Zoey. Well 'E' comes before 'Z' but on the ASCII chart Z comes before É so a normal if ( str1 > str2 )
Won't work.
I tried with if (strcmp(str1,str2) > 0)
still don't work. So i'm looking into a native way to compare string with UTF-8 characters.
You should not use == (equality operator) to compare these strings because they compare the reference of the string, i.e. whether they are the same object or not. On the other hand, equals() method compares whether the value of the strings is equal, and not the object itself.
In other words, strings are compared letter-by-letter. The algorithm to compare two strings is simple: Compare the first character of both strings. If the first character from the first string is greater (or less) than the other string's, then the first string is greater (or less) than the second.
Using String. equals() :In Java, string equals() method compares the two given strings based on the data/content of the string. If all the contents of both the strings are same then it returns true. If any character does not match, then it returns false.
Solution. Following example compares two strings by using str compareTo (string), str compareToIgnoreCase(String) and str compareTo(object string) of string class and returns the ascii difference of first odd characters of compared strings.
IMPORTANT
This answer is meant for situations where it's not possible to run/install the 'intl' extension, and only sorts strings by replacing accented characters to non-accented characters. To sort accented characters according to a specific locale, using a Collator is a better approach -- see the other answer to this question for more information.
Sorting by non-accented characters in PHP 5.2
You may try converting both strings to ASCII using iconv() and the //TRANSLIT option to get rid of accented characters;
$str1 = iconv('utf-8', 'ascii//TRANSLIT', $str1);
Then do the comparison
See the documentation here:
http://www.php.net/manual/en/function.iconv.php
[updated, in response to @Esailija's remark] I overlooked the problem of //TRANSLIT translating accented characters in unexpected ways. This problem is mentioned in this question: php iconv translit for removing accents: not working as excepted?
To make the 'iconv()' approach work, I've added a code sample below that strips all non-word characters from the resulting string using preg_replace().
<?php
setLocale(LC_ALL, 'fr_FR');
$names = array(
'Zoey and another (word) ',
'Émilie and another word',
'Amber',
);
$converted = array();
foreach($names as $name) {
$converted[] = preg_replace('#[^\w\s]+#', '', iconv('UTF-8', 'ASCII//TRANSLIT', $name));
}
sort($converted);
echo '<pre>'; print_r($converted);
// Array
// (
// [0] => Amber
// [1] => Emilie and another word
// [2] => Zoey and another word
// )
There is no native way to do this, however a PECL extension: http://php.net/manual/de/class.collator.php
$c = new Collator('fr_FR');
if ($c->compare('Émily', 'Zoey') < 0) { echo 'Émily < Zoey'; }
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With