Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Comparing UTF-8 String

I'm trying to compare two string lets say Émilie and Zoey. Well 'E' comes before 'Z' but on the ASCII chart Z comes before É so a normal if ( str1 > str2 ) Won't work.

I tried with if (strcmp(str1,str2) > 0) still don't work. So i'm looking into a native way to compare string with UTF-8 characters.

like image 390
poudigne Avatar asked Feb 01 '13 21:02

poudigne


People also ask

Can you use == to compare two strings?

You should not use == (equality operator) to compare these strings because they compare the reference of the string, i.e. whether they are the same object or not. On the other hand, equals() method compares whether the value of the strings is equal, and not the object itself.

How do you compare character strings?

In other words, strings are compared letter-by-letter. The algorithm to compare two strings is simple: Compare the first character of both strings. If the first character from the first string is greater (or less) than the other string's, then the first string is greater (or less) than the second.

How do you compare values in two strings?

Using String. equals() :In Java, string equals() method compares the two given strings based on the data/content of the string. If all the contents of both the strings are same then it returns true. If any character does not match, then it returns false.

How do you compare two strings examples?

Solution. Following example compares two strings by using str compareTo (string), str compareToIgnoreCase(String) and str compareTo(object string) of string class and returns the ascii difference of first odd characters of compared strings.


2 Answers

IMPORTANT

This answer is meant for situations where it's not possible to run/install the 'intl' extension, and only sorts strings by replacing accented characters to non-accented characters. To sort accented characters according to a specific locale, using a Collator is a better approach -- see the other answer to this question for more information.

Sorting by non-accented characters in PHP 5.2

You may try converting both strings to ASCII using iconv() and the //TRANSLIT option to get rid of accented characters;

$str1 = iconv('utf-8', 'ascii//TRANSLIT', $str1);

Then do the comparison

See the documentation here:

http://www.php.net/manual/en/function.iconv.php

[updated, in response to @Esailija's remark] I overlooked the problem of //TRANSLIT translating accented characters in unexpected ways. This problem is mentioned in this question: php iconv translit for removing accents: not working as excepted?

To make the 'iconv()' approach work, I've added a code sample below that strips all non-word characters from the resulting string using preg_replace().

<?php

setLocale(LC_ALL, 'fr_FR');

$names = array(
   'Zoey and another (word) ',
   'Émilie and another word',
   'Amber',
);


$converted = array();

foreach($names as $name) {
    $converted[] = preg_replace('#[^\w\s]+#', '', iconv('UTF-8', 'ASCII//TRANSLIT', $name));
}

sort($converted);

echo '<pre>'; print_r($converted);

// Array
// (
//     [0] => Amber
//     [1] => Emilie and another word
//     [2] => Zoey and another word 
// )
like image 85
thaJeztah Avatar answered Sep 19 '22 13:09

thaJeztah


There is no native way to do this, however a PECL extension: http://php.net/manual/de/class.collator.php

$c = new Collator('fr_FR');
if ($c->compare('Émily', 'Zoey') < 0) { echo 'Émily < Zoey'; }
like image 35
Fabian Schmengler Avatar answered Sep 21 '22 13:09

Fabian Schmengler