In the code of some libraries (e.g. AngularJS, the link leads to the specific lines in the code), I can see that custom case-conversion functions are used instead of the standard ones. It's justified by an assumption that in browsers with Turkish locale, the standard functions don't work as expected:
console.log("SCRIPT".toLowerCase()); // "scrıpt" console.log("script".toUpperCase()); // "SCRİPT"
But is it really or was it ever the case? Do the browsers really behave this way? If so, which of them do? What about node.js? Other JS engines?
The existance of the toLocaleLowerCase
and toLocaleUpperCase
methods implies that toLowerCase
and toUpperCase
are locale-invariant, doesn't it?
For what browsers, specifically, does the Angular team retain this check in the code: if ('i' !== 'I'.toLowerCase())...
?
If your browser (device) uses the Turkish or Azerbaijan locale, please run this snippet and write me if you discover that the issue indeed exists.
if ('i' !== 'I'.toLowerCase()) { document.write('Ooops! toLowerCase is locale-sensitive in your browser. ' + 'Please write your user-agent in the comments to this question: ' + navigator.userAgent); } else { document.write('toLowerCase isn\'t locale-sensitive in your browser. ' + 'Everything works as expected!'); }
<html lang="tr">
Description. The toLowerCase() method returns the value of the string converted to lower case. toLowerCase() does not affect the value of the string str itself.
JavaScript String toLocaleLowerCase() The toLocaleLowerCase() method does not change the original string. The toLocaleLowerCase() returns the same result as toLowerCase() , except for locales that conflict with the regular Unicode case mappings (such as Turkish).
The toLocaleLowerCase() method returns the calling string value converted to lower case, according to any locale-specific case mappings.
To use a keyboard shortcut to change between lowercase, UPPERCASE, and Capitalize Each Word, select the text and press SHIFT + F3 until the case you want is applied.
Note: Please, note that I couldn't test it!
As per ECMAScript specification:
String.prototype.toLowerCase ( )
[...]
For the purposes of this operation, the 16-bit code units of the Strings are treated as code points in the Unicode Basic Multilingual Plane. Surrogate code points are directly transferred from S to L without any mapping.
The result must be derived according to the case mappings in the Unicode character database (this explicitly includes not only the UnicodeData.txt file, but also the SpecialCasings.txt file that accompanies it in Unicode 2.1.8 and later).
[...]
String.prototype.toLocaleLowerCase ( )
This function works exactly the same as toLowerCase except that its result is intended to yield the correct result for the host environment’s current locale, rather than a locale-independent result. There will only be a difference in the few cases (such as Turkish) where the rules for that language conflict with the regular Unicode case mappings.
[...]
And as per Unicode Character Database Special Casing:
[...]
Format
The entries in this file are in the following machine-readable format:
<code>; <lower>; <title>; <upper>; (<condition_list>;)? # <comment>
Unconditional mappings
[...]
Preserve canonical equivalence for I with dot. Turkic is handled below.
0130; 0069 0307; 0130; 0130; # LATIN CAPITAL LETTER I WITH DOT ABOVE
[...]
Language-Sensitive Mappings These are characters whose full case mappings depend on language and perhaps also context (which characters come before or after). For more information see the header of this file and the Unicode Standard.
Lithuanian
Lithuanian retains the dot in a lowercase i when followed by accents.
Remove DOT ABOVE after "i" with upper or titlecase
0307; 0307; ; ; lt After_Soft_Dotted; # COMBINING DOT ABOVE
Introduce an explicit dot above when lowercasing capital I's and J's whenever there are more accents above. (of the accents used in Lithuanian: grave, acute, tilde above, and ogonek)
0049; 0069 0307; 0049; 0049; lt More_Above; # LATIN CAPITAL LETTER I
004A; 006A 0307; 004A; 004A; lt More_Above; # LATIN CAPITAL LETTER J
012E; 012F 0307; 012E; 012E; lt More_Above; # LATIN CAPITAL LETTER I WITH OGONEK
00CC; 0069 0307 0300; 00CC; 00CC; lt; # LATIN CAPITAL LETTER I WITH GRAVE
00CD; 0069 0307 0301; 00CD; 00CD; lt; # LATIN CAPITAL LETTER I WITH ACUTE
0128; 0069 0307 0303; 0128; 0128; lt; #LATIN CAPITAL LETTER I WITH TILDE
Turkish and Azeri
I and i-dotless; I-dot and i are case pairs in Turkish and Azeri The following rules handle those cases.
0130; 0069; 0130; 0130; tr; # LATIN CAPITAL LETTER I WITH DOT ABOVE
0130; 0069; 0130; 0130; az; # LATIN CAPITAL LETTER I WITH DOT ABOVE
When lowercasing, remove dot_above in the sequence I + dot_above, which will turn into i. This matches the behavior of the canonically equivalent I-dot_above
0307; ; 0307; 0307; tr After_I; # COMBINING DOT ABOVE
0307; ; 0307; 0307; az After_I; # COMBINING DOT ABOVE
When lowercasing, unless an I is before a dot_above, it turns into a dotless i.
0049; 0131; 0049; 0049; tr Not_Before_Dot; # LATIN CAPITAL LETTER I
0049; 0131; 0049; 0049; az Not_Before_Dot; # LATIN CAPITAL LETTER I
When uppercasing, i turns into a dotted capital I
0069; 0069; 0130; 0130; tr; # LATIN SMALL LETTER I
0069; 0069; 0130; 0130; az; # LATIN SMALL LETTER I
Note: the following case is already in the UnicodeData.txt file.
0131; 0131; 0049; 0049; tr; # LATIN SMALL LETTER DOTLESS I
EOF
Also, as per JavaScript for Absolute Beginners (by Terry McNavage):
> "I".toLowerCase() // "i" > "i".toUpperCase() // "I" > "I".toLocaleLowerCase() // "<dotless-i>" > "i".toLocaleUpperCase() // "<dotted-I>"
Note:
toLocaleLowerCase()
andtoLocaleUpperCase()
convert case based on your OS settings. You'd have to change those settings to Turkish for the previous sample to work. Or just take my word for it!
And as per bobince's comment over Convert JavaScript String to be all lower case? question:
Accept-Language
andnavigator.language
are two completely separate settings.Accept-Language
reflects the user's chosen preferences for what languages they want to receive in web pages (and this setting is unfortuately inaccessible to JS).navigator.language
merely reflects which localisation of the web browser was installed, and should generally not be used for anything. Both of these values are unrelated to the system locale, which is the bit that decides what toLocaleLowerCase() will do; that's an OS-level setting out of scope of the browser's prefs.
So, setting lang="tr-TR"
to html
won't reflect a real test case, since it's an OS setting that's required to reproduce the special casing example.
I think that only lowercasing dotted-I or uppercasing dotless-i would be locale specific when using toLowerCase()
or toUpperCase()
.
As per those credible/official sources, I think you're right: 'i' !== 'I'.toLowerCase()
would always evaluate to false.
But, as I said, I couldn't test it here.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With