Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

IE8 doesn't handle ignoreCase RegExps in greek

If I take some greek month names and make a case insensitive regexp from them, they won't match the same month in upper case:

<!doctype html>
<html>
<head>
</head>
<body>
<pre></pre>
<script>
    var names = [
        'Μάρτιος',
        'Μάιος',
        'Ιούνιος',
        'Ιούλιος',
        'Αύγουστος',
        'Νοέμβριος'
    ];
    var pre = document.getElementsByTagName('pre')[0];
    var i;
    for (i = 0; i < names.length; ++i) {
        var m = names[i];
        var r = new RegExp(m, 'i');
        pre.innerHTML += m + ' ' + r.test(m.toLocaleUpperCase()) + '\n';
    }
</script>
</body>
</html>

In Ie8 this prints the names and then false. In other browsers it prints true.

like image 777
Iskren Ivov Chernev Avatar asked Oct 25 '13 08:10

Iskren Ivov Chernev


3 Answers

Just use .toUpperCase() instead of .toLocaleUpperCase().

The latter translates Μάρτιος to ΜΆΡΤΙΟΣ, the former translates it to ΜΆΡΤΙΟς.

Which variant is correct I cannot say, though, because I don't know the capitalization rules for ς.

like image 76
Tomalak Avatar answered Oct 22 '22 00:10

Tomalak


Well, all my available versions of IE translates Μάρτιος always to ΜΆΡΤΙΟς, even when using .toUpperCase().

I assume the problem are the variants of some letters (http://de.wikipedia.org/wiki/Griechisches_Alphabet#Klassische_Zeichen).

For example the letters Σ σ Ϲ and ς are all a 'Sigma'. The first both are the classic ones, the other are variants. Another example would be Β, β and ϐ for 'Beta'.

To ensure that these variants are recognized i'd recommend a substition before creating the regex.

Here I made a short (possible incomplete) helper function to do this

function regextendVariants(s)
{
    var variants = [
        ['β', 'ϐ'],
        ['ε', 'ϵ'],
        ['θ', 'ϑ'],
        ['κ', 'ϰ'],
        ['π', 'ϖ'],
        ['ρ', 'ϱ'],
        ['σ', 'Ϲ', 'ς'],
        ['φ', 'ϕ']
    ];

    for (var j = 0; j < variants.length; j++) {
        var variant = variants[j];
        for (var k = 1; k < variant.length; k++) {
            s = s.replace(variant[k], '['+variant.join('')+']');
        }
    }

    return s;
}

This function converts your strings to

  • Μάρτιο[σϹς]
  • Μάιο[σϹς]
  • Ιούνιο[σϹς]
  • Ιούλιο[σϹς]
  • Αύγουστο[σϹς]
  • Νοέμβριο[σϹς]

These strings allows different variants of the same letter. I'm sure, this is grammatically incorrect, but it should be more solid to match the strings.

In your code you've to replace

var r = new RegExp(m, 'i');

with

var r = new RegExp(regextendVariants(m), 'i');

As I said my versions of IE doesn't make an error, so I cannot promise you this will be the final solution for your problem, well I hope it is ;)

like image 32
bukart Avatar answered Oct 21 '22 22:10

bukart


ς is \xCF\x82 in UTF-8 or U+03C2 as the hexidecimal value of the Unicode codepoint that has been present since Unicode 1.1.

The Unicode Character Data (UCD) entry in SpecialCasing.txt for this is:

# <code>; <lower> ; <title> ; <upper> ; (<condition_list> ;)? # <comment>
03A3; 03C2; 03A3; 03A3; Final_Sigma; # GREEK CAPITAL LETTER SIGMA

where U+03A3 is the Greek Capital Letter Sigma (Σ). This is defined as far back as at least Unicode 2.1 Update 3 (http://www.unicode.org/Public/2.1-Update3/SpecialCasing-1.txt), so IE8 should support the case mapping.

Therefore, Σ is the correct capitalisation for ς.

The MSDN documentation for the toUpperCase and toLocaleUpperCase functions says that both use the Unicode case mappings. The toLocaleUpperCase function uses system locale case mappings if there is a conflict with the current system locale (e.g. for some Turkish mappings). Thus, if you just want the Unicode case mappings you should use toUpperCase.

like image 1
reece Avatar answered Oct 21 '22 22:10

reece