What does "The .NET framework uses the UTF-16 encoding standard by default" mean?

1 Answers

“UTF-16” is an annoying term, as it has two meanings which are easily confused.

The first meaning is a series of 16-bit codepoints. Most of these correspond directly to the Unicode character of the same number; characters outside the Basic Multilingual Plane (U+10000 upwards) are stored as two 16-bit codepoints, each one of the Surrogates.

Many languages use UTF-16 in this sense for internal storage purposes, including as a native string type. This is the usual source of phrases like “.NET (or Java) uses UTF-16 as its default encoding”. .NET is accessing the elements of such a UTF-16 string 16 bits at a time (ie, at the implementation level, as a uint16).

The next thing to consider is the encoding of such a UTF-16 string into linear bytes, for storage in a file or network stream. As always when you store larger numbers into bytes, there are two possible encodings: little-endian or big-endian. So you can use “UTF-16LE”, the little-endian encoding of UTF-16 into bytes, or “UTF-16BE”, the big-endian encoding.

(“UTF-16LE” is the more commonly used. Just to add more confusion to the flames, Windows gives it the deeply misleading and ambiguous encoding name “Unicode”. In reality it is almost always better to use UTF-8 for file storage and network streams than either of UTF-16LE/BE.)

But if you don't know whether a bunch of bytes contains “UTF-16LE” or “UTF-16BE”, you can use the trick of looking at the first code point to work it out. This code point, the Byte Order Mark (BOM), is only valid when read one way around, so you can't mistake one encoding for the other.

This approach, of not caring what byte order you have but using a BOM to signal it, is usually referred to under the encoding name... “UTF-16”.

So, when someone says “UTF-16”, you can't tell whether they mean a sequence of short-int Unicode code points, or a sequence of bytes in unspecified order that will decode to one.

(“UTF-32” has the same problem.)

If you don't know what encoding to use when you create a file, don't specify one and .NET will use UTF16

If that's the actual direct quote it is a lie. Constructing a StreamWriter without an encoding argument is explicitly specified to give you UTF-8.

146

answered Oct 05 '22 17:10

bobince

Related questions
                            
                                Convert LDAP AccountExpires to DateTime in C#
                            
                                Calculate distance of two geo points in km c#
                            
                                Mailbox unavailable. The server response was: 5.7.1 Unable to relay Error
                            
                                IObserver and IObservable in C# for Observer vs Delegates, Events
                            
                                Converting UIImage to Byte Array
                            
                                .mdf" failed with the operating system error 2(The system cannot find the file specified.)
                            
                                System.TypeLoadException: Could not resolve type with token 01000019
                            
                                creating an alias for a function name in C#
                            
                                C# DataGridView not updated when datasource is changed
                            
                                Can't resize the form by dragging its borders
                            
                                Fastest way to separate the digits of an int into an array in .NET?
                            
                                Maximum Timer interval
                            
                                Forbid public Add and Delete for a List<T>
                            
                                Difference between for(;;) and while (true) in C#?
                            
                                C# Generic Method Without Specifying Type
                            
                                Error parsing AppSettings value with a query string
                            
                                System.BadImageFormatException when target framework is 4.0
                            
                                Assign datetime value to today's date with specific time
                            
                                Fastest way to convert string array to double array?
                            
                                Directory.GetFiles get today's files only

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

What does "The .NET framework uses the UTF-16 encoding standard by default" mean?

Tags:

c#

.net

stream

encoding

J M

People also ask

1 Answers

bobince

Recent Activity

Donate For Us