If I take some greek month names and make a case insensitive regexp from them, they won't match the same month in upper case:
<!doctype html>
<html>
<head>
</head>
<body>
<pre></pre>
<script>
var names = [
'Μάρτιος',
'Μάιος',
'Ιούνιος',
'Ιούλιος',
'Αύγουστος',
'Νοέμβριος'
];
var pre = document.getElementsByTagName('pre')[0];
var i;
for (i = 0; i < names.length; ++i) {
var m = names[i];
var r = new RegExp(m, 'i');
pre.innerHTML += m + ' ' + r.test(m.toLocaleUpperCase()) + '\n';
}
</script>
</body>
</html>
In Ie8 this prints the names and then false. In other browsers it prints true.
Just use .toUpperCase()
instead of .toLocaleUpperCase()
.
The latter translates Μάρτιος
to ΜΆΡΤΙΟΣ
, the former translates it to ΜΆΡΤΙΟς
.
Which variant is correct I cannot say, though, because I don't know the capitalization rules for ς
.
Well, all my available versions of IE translates Μάρτιος
always to ΜΆΡΤΙΟς
, even when using .toUpperCase()
.
I assume the problem are the variants of some letters (http://de.wikipedia.org/wiki/Griechisches_Alphabet#Klassische_Zeichen).
For example the letters Σ σ Ϲ and ς are all a 'Sigma'. The first both are the classic ones, the other are variants. Another example would be Β, β and ϐ for 'Beta'.
To ensure that these variants are recognized i'd recommend a substition before creating the regex.
Here I made a short (possible incomplete) helper function to do this
function regextendVariants(s)
{
var variants = [
['β', 'ϐ'],
['ε', 'ϵ'],
['θ', 'ϑ'],
['κ', 'ϰ'],
['π', 'ϖ'],
['ρ', 'ϱ'],
['σ', 'Ϲ', 'ς'],
['φ', 'ϕ']
];
for (var j = 0; j < variants.length; j++) {
var variant = variants[j];
for (var k = 1; k < variant.length; k++) {
s = s.replace(variant[k], '['+variant.join('')+']');
}
}
return s;
}
This function converts your strings to
These strings allows different variants of the same letter. I'm sure, this is grammatically incorrect, but it should be more solid to match the strings.
In your code you've to replace
var r = new RegExp(m, 'i');
with
var r = new RegExp(regextendVariants(m), 'i');
As I said my versions of IE doesn't make an error, so I cannot promise you this will be the final solution for your problem, well I hope it is ;)
ς
is \xCF\x82
in UTF-8 or U+03C2
as the hexidecimal value of the Unicode codepoint that has been present since Unicode 1.1.
The Unicode Character Data (UCD) entry in SpecialCasing.txt
for this is:
# <code>; <lower> ; <title> ; <upper> ; (<condition_list> ;)? # <comment>
03A3; 03C2; 03A3; 03A3; Final_Sigma; # GREEK CAPITAL LETTER SIGMA
where U+03A3
is the Greek Capital Letter Sigma (Σ
). This is defined as far back as at least Unicode 2.1 Update 3 (http://www.unicode.org/Public/2.1-Update3/SpecialCasing-1.txt), so IE8 should support the case mapping.
Therefore, Σ
is the correct capitalisation for ς
.
The MSDN documentation for the toUpperCase and toLocaleUpperCase functions says that both use the Unicode case mappings. The toLocaleUpperCase
function uses system locale case mappings if there is a conflict with the current system locale (e.g. for some Turkish mappings). Thus, if you just want the Unicode case mappings you should use toUpperCase
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With