Which Languages Does UTF-8 Not Support?

Tags:

I'm working on internationalizing one of my programs for work. I'm trying to use foresight to avoid possible issues or redoing the process down the road.

I see references for UTF-8, UTF-16 and UTF-32. My question is two parts:

What languages does UTF-8 not support?
What advantages do UTF-16 and UTF-32 have over UTF-8?

If UTF-8 works for everything, then I'm curious what the advantage of UTF-16 and UTF-32 are (e.g. special search features in a database, etc) Having the understanding should help me finish designing my program (and database connections) properly. Thanks!

592

asked Mar 27 '13 16:03

James Oravec

1 Answers

All three are just different ways to represent the same thing, so there are no languages supported by one and not another.

Sometimes UTF-16 is used by a system that you need to interoperate with - for instance, the Windows API uses UTF-16 natively.

In theory, UTF-32 can represent any "character" in a single 32-bit integer without ever needing to use more than one, whereas UTF-8 and UTF-16 need to use more than one 8-bit or 16-bit integer to do that. But in practise, with combining and non-combining variants of some codepoints, that's not really true.

One advantage of UTF-8 over the others is that if you have a bug whereby you're assuming that the number of 8-, 16- or 32-bit integers respectively is the same as the number of codepoints, it becomes obvious more quickly with UTF-8 - something will fail as soon as you have any non-ASCII codepoint in there, whereas with UTF-16 the bug can go unnoticed.

To answer your first question, here's a list of scripts currently unsupported by Unicode: http://www.unicode.org/standard/unsupported.html

answered Oct 15 '22 17:10

RichieHindle

Related questions
                            
                                printing UTF-8 in Python 3 using Sublime Text 3
                            
                                php mysql insert into utf-8 [closed]
                            
                                How prevalent is UTF-8 really?
                            
                                Is Java UTF-8 Charset exception possible?
                            
                                removing characters of a specific unicode range from a string
                            
                                how to change log4j log file to utf8
                            
                                How can I put utf-16 characters in Android string resource?
                            
                                Can urls have UTF-8 characters?
                            
                                How to display japanese Kanji inside a cmd window under windows?
                            
                                PHP Security: how can encoding be misused?
                            
                                Java 8 UTF-8 encoding issue (java bug?)
                            
                                JSON Serialization in C
                            
                                Invalid byte sequence in UTF-8 (ArgumentError)
                            
                                Perl Encode.pm cannot decode string with wide character
                            
                                Converting "normal" std::string to utf-8
                            
                                How do I ignore the UTF-8 Byte Order Marker in String comparisons?
                            
                                Confused about C++'s std::wstring, UTF-16, UTF-8 and displaying strings in a windows GUI
                            
                                How to uppercase/lowercase UTF-8 characters in C++?
                            
                                normalizing accented characters in MySQL queries
                            
                                How can I detect if a .NET StreamReader found a UTF8 BOM on the underlying stream?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Which Languages Does UTF-8 Not Support?

Tags:

utf-8

utf-16

utf

internationalization

c++builder

James Oravec

People also ask

1 Answers

RichieHindle

Recent Activity

Donate For Us