UTF-8 or UTF-16 or UTF-32 or UCS-2

2 Answers

Quick note: basically everything can be represented in the unicode character set. UTF-8 is just one encoding that's able to represent all of the characters in this set.

UCS-2 is not really a thing to use anymore. It can't hold characters beyond U+FFFF.

Which of the remaining three depends on what kind of operations you want to do on the text. UTF-8 (usually, not always!) will take up less space on disk representing the same data, and is a strict superset of ASCII, so it might reduce the amount of transcoding needed. However, you can't index your string or find its length in constant time.

UTF-32 does allow you to find the length of the string and index it in constant time. It isn't a superset of ASCII like UTF-8 is. It does also require you to have 4 bytes per code point, but hey, disk space is cheap.

answered Sep 22 '22 18:09

habnabit

UTF-8 or UTF-16 are both good choices. They both give you access to the full range of Unicode code points without using up 4 bytes for every character.

Your choice will be influenced by the language you're using and its support for these formats. I believe UTF-8 plays best with ASP.NET overall, but it will depend on what you're doing.

UTF-8 is often a good choice overall because it plays well with code that expects only ASCII, whereas UTF-16 doesn't. It is also the most efficient way of representing content largely consisting of our English alphabet, while still allowing the full repertoire of Unicode when needed. A good reason for choosing UTF-16 would be if your language/framework used it natively, or if you're going to be mainly using characters that aren't in ASCII, such as Asian languages.

answered Sep 22 '22 18:09

thomasrutter

Related questions
                            
                                How to mask string?
                            
                                What are the step to the Reingold-Tilford algorithm and how might I program it?
                            
                                Generate an Adjacency Matrix for a Weighted Graph
                            
                                Entity Framework - default values doesn't set in sql server table
                            
                                Why is "int + string" possible in statically-typed C# but not in dynamically-typed Python?
                            
                                When is the best place to use Task.Result instead of awaiting Task
                            
                                Using this() in C# Constructors
                            
                                IList<T> does not have "where"
                            
                                What is the size of udp packets if I send 0 payload data in c#?
                            
                                Remove specific character from a string based on hex value
                            
                                C# array get last item from split in one line
                            
                                how to get current application path in wpf
                            
                                Splitting LINQ query based on predicate
                            
                                Using LINQ to shuffle a deck
                            
                                How to count the number of rows from sql table in c#?
                            
                                How to lookup and invoke a .Net TypeConverter for a particular type?
                            
                                Why does Int32.TryParse reset the out parameter when not able to convert?
                            
                                What is the 'd' in the literal 12d called?
                            
                                What is the most elegant way to get a set of items by index from a collection?
                            
                                What is the little padlock symbol near every Visual Studio tab?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

UTF-8 or UTF-16 or UTF-32 or UCS-2

Tags:

c#

asp.net

unicode

Pola Edward

People also ask

2 Answers

habnabit

thomasrutter

Recent Activity

Donate For Us