Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is text utf-16 as opposed to utf-8

Tags:

The text library uses utf-16 internally. utf-8 is a more commonly used encoding, especially in C libraries. In addition, utf-8 uses less memory most of the time. Why does text use utf-16?

like image 366
fread2281 Avatar asked May 20 '14 17:05

fread2281


People also ask

Should I use UTF-8 or UTF-16?

UTF-16 is, obviously, more efficient for A) characters for which UTF-16 requires fewer bytes to encode than does UTF-8. UTF-8 is, obviously, more efficient for B) characters for which UTF-8 requires fewer bytes to encode than does UTF-16.

What is the difference between UTF-8 and UTF-16?

The main difference between UTF-8, UTF-16, and UTF-32 character encoding is how many bytes it requires to represent a character in memory. UTF-8 uses a minimum of one byte, while UTF-16 uses a minimum of 2 bytes.

Why a character in UTF-32 takes more space than in UTF-16 or UTF-8?

Characters within the ASCII range take only one byte while very unusual characters take four. UTF-32 uses four bytes per character regardless of what character it is, so it will always use more space than UTF-8 to encode the same string.

Which text encoding should I use?

As a content author or developer, you should nowadays always choose the UTF-8 character encoding for your content or data. This Unicode encoding is a good choice because you can use a single character encoding to handle any character you are likely to need. This greatly simplifies things.


1 Answers

There was a project to convert text to using utf8 internally, because that's irrelevant to the API it provides. After completing enough to benchmark, the project was considered not an improvement and not integrated with the mainline at this time. There is a chance it will be in the future, if it can become a sufficient improvement. Here's the full story: http://jaspervdj.be/posts/2011-08-19-text-utf8-the-aftermath.html

like image 180
Carl Avatar answered Sep 18 '22 06:09

Carl