In what JS engines, specifically, are toLowerCase & toUpperCase locale-sensitive?

Tags:

In the code of some libraries (e.g. AngularJS, the link leads to the specific lines in the code), I can see that custom case-conversion functions are used instead of the standard ones. It's justified by an assumption that in browsers with Turkish locale, the standard functions don't work as expected:

console.log("SCRIPT".toLowerCase()); // "scrıpt" console.log("script".toUpperCase()); // "SCRİPT"

But is it really or was it ever the case? Do the browsers really behave this way? If so, which of them do? What about node.js? Other JS engines?

The existance of the toLocaleLowerCase and toLocaleUpperCase methods implies that toLowerCase and toUpperCase are locale-invariant, doesn't it?

For what browsers, specifically, does the Angular team retain this check in the code: if ('i' !== 'I'.toLowerCase())...?

If your browser (device) uses the Turkish or Azerbaijan locale, please run this snippet and write me if you discover that the issue indeed exists.

if ('i' !== 'I'.toLowerCase()) {    document.write('Ooops! toLowerCase is locale-sensitive in your browser. ' +      'Please write your user-agent in the comments to this question: ' +      navigator.userAgent);   } else {    document.write('toLowerCase isn\'t locale-sensitive in your browser. ' +      'Everything works as expected!');  }

<html lang="tr">

393

asked Mar 01 '15 09:03

thorn0

1 Answers

Note: Please, note that I couldn't test it!

As per ECMAScript specification:

String.prototype.toLowerCase ( )

[...]

For the purposes of this operation, the 16-bit code units of the Strings are treated as code points in the Unicode Basic Multilingual Plane. Surrogate code points are directly transferred from S to L without any mapping.

The result must be derived according to the case mappings in the Unicode character database (this explicitly includes not only the UnicodeData.txt file, but also the SpecialCasings.txt file that accompanies it in Unicode 2.1.8 and later).

[...]

String.prototype.toLocaleLowerCase ( )

This function works exactly the same as toLowerCase except that its result is intended to yield the correct result for the host environment’s current locale, rather than a locale-independent result. There will only be a difference in the few cases (such as Turkish) where the rules for that language conflict with the regular Unicode case mappings.

[...]

And as per Unicode Character Database Special Casing:

[...]

Format

The entries in this file are in the following machine-readable format:

<code>; <lower>; <title>; <upper>; (<condition_list>;)? # <comment>

Unconditional mappings

[...]

Preserve canonical equivalence for I with dot. Turkic is handled below.

0130; 0069 0307; 0130; 0130; # LATIN CAPITAL LETTER I WITH DOT ABOVE

[...]

Language-Sensitive Mappings These are characters whose full case mappings depend on language and perhaps also context (which characters come before or after). For more information see the header of this file and the Unicode Standard.

Lithuanian

Lithuanian retains the dot in a lowercase i when followed by accents.

Remove DOT ABOVE after "i" with upper or titlecase

0307; 0307; ; ; lt After_Soft_Dotted; # COMBINING DOT ABOVE

Introduce an explicit dot above when lowercasing capital I's and J's whenever there are more accents above. (of the accents used in Lithuanian: grave, acute, tilde above, and ogonek)

0049; 0069 0307; 0049; 0049; lt More_Above; # LATIN CAPITAL LETTER I

004A; 006A 0307; 004A; 004A; lt More_Above; # LATIN CAPITAL LETTER J

012E; 012F 0307; 012E; 012E; lt More_Above; # LATIN CAPITAL LETTER I WITH OGONEK

00CC; 0069 0307 0300; 00CC; 00CC; lt; # LATIN CAPITAL LETTER I WITH GRAVE

00CD; 0069 0307 0301; 00CD; 00CD; lt; # LATIN CAPITAL LETTER I WITH ACUTE

0128; 0069 0307 0303; 0128; 0128; lt; #LATIN CAPITAL LETTER I WITH TILDE

Turkish and Azeri

I and i-dotless; I-dot and i are case pairs in Turkish and Azeri The following rules handle those cases.

0130; 0069; 0130; 0130; tr; # LATIN CAPITAL LETTER I WITH DOT ABOVE

0130; 0069; 0130; 0130; az; # LATIN CAPITAL LETTER I WITH DOT ABOVE

When lowercasing, remove dot_above in the sequence I + dot_above, which will turn into i. This matches the behavior of the canonically equivalent I-dot_above

0307; ; 0307; 0307; tr After_I; # COMBINING DOT ABOVE

0307; ; 0307; 0307; az After_I; # COMBINING DOT ABOVE

When lowercasing, unless an I is before a dot_above, it turns into a dotless i.

0049; 0131; 0049; 0049; tr Not_Before_Dot; # LATIN CAPITAL LETTER I

0049; 0131; 0049; 0049; az Not_Before_Dot; # LATIN CAPITAL LETTER I

When uppercasing, i turns into a dotted capital I

0069; 0069; 0130; 0130; tr; # LATIN SMALL LETTER I

0069; 0069; 0130; 0130; az; # LATIN SMALL LETTER I

Note: the following case is already in the UnicodeData.txt file.

0131; 0131; 0049; 0049; tr; # LATIN SMALL LETTER DOTLESS I

EOF

Also, as per JavaScript for Absolute Beginners (by Terry McNavage):

> "I".toLowerCase() // "i" > "i".toUpperCase() // "I" > "I".toLocaleLowerCase() // "<dotless-i>" > "i".toLocaleUpperCase() // "<dotted-I>" 
Note: toLocaleLowerCase() and toLocaleUpperCase() convert case based on your OS settings. You'd have to change those settings to Turkish for the previous sample to work. Or just take my word for it!

And as per bobince's comment over Convert JavaScript String to be all lower case? question:

Accept-Language and navigator.language are two completely separate settings. Accept-Language reflects the user's chosen preferences for what languages they want to receive in web pages (and this setting is unfortuately inaccessible to JS). navigator.language merely reflects which localisation of the web browser was installed, and should generally not be used for anything. Both of these values are unrelated to the system locale, which is the bit that decides what toLocaleLowerCase() will do; that's an OS-level setting out of scope of the browser's prefs.

So, setting lang="tr-TR" to html won't reflect a real test case, since it's an OS setting that's required to reproduce the special casing example.

I think that only lowercasing dotted-I or uppercasing dotless-i would be locale specific when using toLowerCase() or toUpperCase().

As per those credible/official sources, I think you're right: 'i' !== 'I'.toLowerCase() would always evaluate to false.

But, as I said, I couldn't test it here.

141

answered Sep 20 '22 13:09

falsarella

Related questions
                            
                                Why use (function(){}).call(this);? [duplicate]
                            
                                Array.length gives incorrect length
                            
                                Measuring text width/height without rendering
                            
                                Use ServiceWorker cache only when offline
                            
                                How do Shadertoy's audio shaders work?
                            
                                What does the "System" category of records mean in Chrome Timeline profiling tool?
                            
                                Split a Javascript class (ES6) over multiple files?
                            
                                Web services API Keys and Ajax - Securing the Key
                            
                                jQuery Slide Up Table Row
                            
                                Is there any JavaScript libraries for graph operations and algorithms?
                            
                                How to increment number by 0.01 in javascript using a loop?
                            
                                Combine source maps of two compilation steps
                            
                                Tab specific cookies without using sessionStorage or any HTML5 features
                            
                                Deeplinking mobile browsers to native app - Issues with Chrome when app isn't installed
                            
                                Set cache to files in Firebase Storage
                            
                                Good practices for writing HTML in Javascript
                            
                                Random floating point double in Inclusive Range
                            
                                Portable MongoDB? [closed]
                            
                                Position fixed not working in mobile browser
                            
                                Can I debug node.js applications in Sublime 2? [closed]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

In what JS engines, specifically, are toLowerCase & toUpperCase locale-sensitive?

Tags:

javascript

angularjs

unicode

internationalization

turkish

thorn0

People also ask

1 Answers

String.prototype.toLowerCase ( )

String.prototype.toLocaleLowerCase ( )

Format

Unconditional mappings

Lithuanian

Turkish and Azeri

falsarella

Recent Activity

Donate For Us