Is there any reason to prefer UTF-16 over UTF-8?

Tags:

Examining the attributes of UTF-16 and UTF-8, I can't find any reason to prefer UTF-16.

However, checking out Java and C#, it looks like strings and chars there default to UTF-16. I was thinking that it might be for historic reasons, or perhaps for performance reasons, but couldn't find any information.

Anyone knows why these languages chose UTF-16? And is there any valid reason for me to do that as well?

EDIT: Meanwhile I've also found this answer, which seems relevant and has some interesting links.

414

asked May 29 '10 11:05

Oak

1 Answers

East Asian languages typically require less storage in UTF-16 (2 bytes is enough for 99% of East-Asian language characters) than UTF-8 (typically 3 bytes is required).

Of course, for Western lanagues, UTF-8 is usually smaller (1 byte instead of 2). For mixed files like HTML (where there's a lot of markup) it's much of a muchness.

Processing of UTF-16 for user-mode applications is slightly easier than processing UTF-8, because surrogate pairs behave in almost the same way that combining characters behave. So UTF-16 can usually be processed as a fixed-size encoding.

197

answered Sep 23 '22 03:09

Dean Harding

Related questions
                            
                                I don't "get" how a program can update itself. How can I make my software update?
                            
                                Better Way To Get Char Enum
                            
                                readonly keyword does not make a List<> ReadOnly?
                            
                                I want data in the rest of wpf DataGrid to be read only and only new row should be editable
                            
                                Populate a datagridview with sql query results
                            
                                Fastest algorithm to check if a number is pandigital?
                            
                                Use Moq to mock Constructor?
                            
                                Finding an element in a DbSet with a composite primary key
                            
                                Determining if file exists using c# and resolving UNC path
                            
                                Why won't my windows service write to my log file?
                            
                                Programmatically adding Images to RTF Document
                            
                                Why use String.Concat() in C#?
                            
                                How to transform XML as a string w/o using files in .NET?
                            
                                What does this mean? int i = (i = 20);
                            
                                Split comma-separated values
                            
                                Why Use Async/Await Over Normal Threading or Tasks?
                            
                                Blazor, ASP.NET Core Hosted vs Server Side in ASP.NET Core
                            
                                How to inject or use IConfiguration in Azure Function V3 with Dependency Injection when configuring a service
                            
                                Show a message box from a class in c#?
                            
                                ASP.Net Version/Build Number

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Is there any reason to prefer UTF-16 over UTF-8?

Tags:

java

c#

unicode

utf-8

utf-16

Oak

People also ask

1 Answers

Dean Harding

Recent Activity

Donate For Us