What is the difference between UTF-32 and UCS-4 ? Isn't UTF-32 supposed to be a fixed-width encoding ?

<code>UTF-32</code> has started as a subset of <code>UCS-4</code>. Now it is identical except that the UTF-32 standard has additional Unicode semantics. See details on wikipedia: <blockquote> The original ISO 10646 standard defines a 31-bit encoding form called UCS-4, in which each encoded character in the Universal Character Set (UCS) is represented by a 32-bit friendly code value in the code space of integers between 0 and hexadecimal 7FFFFFFF. Because only 17 planes are actually in use, all current code points are between 0 and 0x10FFFF. UTF-32 is a subset of UCS-4 that uses only this range. Since the Principles and Procedures document of JTC1/SC2/WG2 states that all future assignments of characters will be constrained to the BMP or the first 14 supplementary planes, UTF-32 will be able to represent all Unicode characters. Accordingly, UCS-4 and UTF-32 are now identical except that the UTF-32 standard has additional Unicode semantics. </blockquote> However, I am not exactly sure, what <code>additional Unicode semantics</code> means. Maybe someone can provide a better answer.

What is the difference between UTF-32 and UCS-4?

2 Answers

The Unicode Standard Version 8.0, Appendix C states:

UCS-4 stands for “Universal Character Set coded in 4 octets.” It is now treated simply as a synonym for UTF-32, and is considered the canonical form for representation of characters in ISO 10646 (Universal Coded Character Set).

answered Oct 03 '22 06:10

Jonathan Maddox

UTF-32 has started as a subset of UCS-4. Now it is identical except that the UTF-32 standard has additional Unicode semantics. See details on wikipedia:

The original ISO 10646 standard defines a 31-bit encoding form called UCS-4, in which each encoded character in the Universal Character Set (UCS) is represented by a 32-bit friendly code value in the code space of integers between 0 and hexadecimal 7FFFFFFF.

Because only 17 planes are actually in use, all current code points are between 0 and 0x10FFFF. UTF-32 is a subset of UCS-4 that uses only this range. Since the Principles and Procedures document of JTC1/SC2/WG2 states that all future assignments of characters will be constrained to the BMP or the first 14 supplementary planes, UTF-32 will be able to represent all Unicode characters. Accordingly, UCS-4 and UTF-32 are now identical except that the UTF-32 standard has additional Unicode semantics.

However, I am not exactly sure, what additional Unicode semantics means. Maybe someone can provide a better answer.

answered Oct 03 '22 05:10

Christian Gollhardt

Related questions
                            
                                Removing leading zeroes from a string
                            
                                Convert a String to Modified Camel Case in Java or Title Case as is otherwise called [duplicate]
                            
                                Why does Java's concat() method not do anything?
                            
                                Why must C/C++ string literal declarations be single-line?
                            
                                Why isn't cin >> string working with Visual C++ 2010? [closed]
                            
                                How to split new line in string in vb.net
                            
                                How to prevent echo in PHP and catch what it is inside?
                            
                                Print the time an hour ago [duplicate]
                            
                                concatenate two strings
                            
                                How to check if string starts with certain string in C?
                            
                                How to shuffle characters in a string without using Collections.shuffle(...)?
                            
                                How to remove a particular substring from a string?
                            
                                Why are strings in C++ usually terminated with '\0'?
                            
                                Java - how to convert letters in a string to a number?
                            
                                IOS : NSString retrieving a substring from a string
                            
                                strstr() function like, that ignores upper or lower case
                            
                                Is JavaScript string comparison just as fast as number comparison?
                            
                                Difference in c_str function specification between C++03 and C++11
                            
                                Concatenate char literal ('x') vs single char string literal ("x")
                            
                                MySQL binary against non-binary for hash IDs

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

What is the difference between UTF-32 and UCS-4?

Tags:

string

char

encoding

unicode

utf

Virus721

People also ask

2 Answers

Jonathan Maddox

Christian Gollhardt

Recent Activity

Donate For Us