What version of Unicode is supported by which .NET platform and on which version of Windows in regards to character classes?

Tags:

Updated question ¹

With regards to character classes, comparison, sorting, normalization and collations, what Unicode version or versions are supported by which .NET platforms?

Original question

I remember somewhat vaguely having read that .NET supported Unicode version 3.0 and that the internal UTF-16 encoding is not really UTF-16 but actually uses UCS-2, which is not the same. It seems, for instance, that characters above U+FFFF are not possible, i.e. consider:

string s = "\u1D7D9"; // ("Mathematical double-struck digit one")

and it stores the string "ᵽ9".

I'm basically looking for definitive references of answers to the following:

If it isn't true UTF-16 in .NET, what is it?
What version of Unicode is supported by .NET?
If recent versions are not supported or planned in the near future, does anybody know of a (non)commercial library or how I can workaround this issue?

¹) I updated the question as with passing time, it seems more appropriate with respect to the answers and to the larger community. I left the original question in place of which parts have been answered in the comments. Also the old UCS-2 (no surrogates) was used in now-ancient 32 bit Windows versions, .NET has always used UTF-16 (with surrogates) internally.

334

asked Feb 06 '12 15:02

Abel

2 Answers

Internally, .NET is UTF-16. In some cases, e.g. when ASP.NET writes to a response, by default it uses UTF-8. Both of them can handle higher planes.

The reason people sometimes refer to .NET as UCS2 is (I think, because I see few other reasons) that Char is strictly 16 bit and a single Char can't be used to represent the upper planes. Char does, however, have static method overloads (e.g. Char.IsLetter) that can operate on high plane UTF-16 characters inside a string. Strings are stored as true UTF-16.

You can address high Unicode codepoints directly using uppercase \U - e.g. "\U0001D7D9" - but again, only inside strings, not chars.

As for Unicode version, from the MSDN documentation:

"In the .NET Framework 4, sorting, casing, normalization, and Unicode character information is synchronized with Windows 7 and conforms to the Unicode 5.1 standard."

Update 1: It's worth noting, however, that this does not imply that the entirety of Unicode 5.1 is supported - neither in Windows 7 nor in .NET 4.0

Windows 8 targets Unicode 6.0 - I'm guessing that .NET Framework 4.5 might synchronize with that, but have found no sources confirming it. And once again, that doesn't mean the entire standard is implemented.

Update 2: This note on Roslyn confirms that the underlying platform defines the Unicode support for the compiler, and in the link to the code it explains that C# 6.0 supports Unicode 6.0 and up (with a breaking change for C# identifiers as a result).

Update 3: Since .NET version 4.5 a new class SortVersion is introduced to get the supported Unicode version by calling the static property SortVersion.FullVersion. On the same page, Microsoft explains that .NET 4.0 supports Unicode 5.0 on all platforms and .NET 4.5 supports Unicode 5.0 on Windows 7 and Unicode 6.0 on Windows 8. This slightly contrasts the official "what is new" statement here, which talks of version 5.x and 6.0 respectively. From my own (editor: Abel) experience, in most cases it seems that in .NET 4.0, Unicode 5.1 is supported at least for character classes, but I didn't test sorting, normalization and collations. This seems in line with what is said in MSDN as quoted above.

answered Sep 20 '22 07:09

JimmiTh

That character is supported. One thing to note is that for unicode characters with more than 2 bytes, you must declare them with an uppercase '\U', like this:

string text = "\U0001D7D9"

If you create a WPF app with that character in a text block, it should render the double-one character perfectly.

answered Sep 22 '22 07:09

Joe Strommen

Related questions
                            
                                Asynchronous multi-direction server-client communication over the same open socket?
                            
                                Where can I find C# 3.0 grammar?
                            
                                Why does IsAssignableFrom() not work for int and double?
                            
                                Using higher-order Haskell types in C#
                            
                                Caching Compiled Expression tree
                            
                                How to create a new Dictionary<,> from an IReadOnlyDictionary<,>?
                            
                                Design with async/await - should everything be async?
                            
                                Explicitly marking derived class as implementing interface of base class
                            
                                Override abstract readonly property to read/write property
                            
                                The relationship between the two objects cannot be defined because they are attached to different ObjectContext objects
                            
                                Unwanted Decimal Truncation
                            
                                .NET Core Identity as UI canceling Register
                            
                                Difference between UnhandledException and DispatcherUnhandledException in .NET
                            
                                .NET library for text algorithms?
                            
                                Import a C++ .lib and .h file into a C# project?
                            
                                System.Net.Http.HttpRequestException Error while copying content to a stream
                            
                                Should I implement IDisposable when class has IDisposable member but no unmanaged resources?
                            
                                Azure AD B2C - Role management
                            
                                Streaming videos with ASP.NET Core 3
                            
                                Dialog MessageBox sometimes hidden behind the main form

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

What version of Unicode is supported by which .NET platform and on which version of Windows in regards to character classes?

Tags:

c#

.net

utf-16

astral-plane

ucs2

Abel

People also ask

2 Answers

JimmiTh

Joe Strommen

Recent Activity

Donate For Us