Why do we need both UCS and Unicode character sets? [closed]

2 Answers

They are not two standards. The Universal Character Set (UCS) is not a standard but something defined in a standard, namely ISO 10646. This should not be confused with encodings, such as UCS-2.

It is difficult to guess whether you actually mean different encodings or different standards. But regarding the latter, Unicode and ISO 10646 were originally two distinct standardization efforts with different goals and strategies. They were however harmonized in the early 1990s to avoid all the mess resulting from two different standards. They have been coordinated so that the code points are indeed the same.

They were kept distinct, though, partly because Unicode is defined by an industry consortium that can work flexibly and has great interest in standardizing things beyond simple code point assignments. The Unicode Standard defines a large number of principles and processing rules, not just the characters. ISO 10646 is a formal standard that can be referenced in standards and other documents of the ISO and its members.

141

answered Oct 14 '22 18:10

Jukka K. Korpela

The codepoints are the same but there are some differences. From the Wikipedia entry about the differences between Unicode and ISO 10646 (i.e. UCS):

The difference between them is that Unicode adds rules and specifications that are outside the scope of ISO 10646. ISO 10646 is a simple character map, an extension of previous standards like ISO 8859. In contrast, Unicode adds rules for collation, normalization of forms, and the bidirectional algorithm for scripts like Hebrew and Arabic

You might find useful to read the Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)

I think the differences come from the way the code points are encoded. UCS-x uses a fixed amount of bytes to encode a code point. For example, UCS-2 uses two bytes. However, UCS-2 cannot encode code points that would require over 2 bytes. On the other hand, UTF uses variable amount of bytes for encoding. For example, UTF-8 uses at least one byte (for ascii characters) but uses more bytes if the character is outside the ascii range.

answered Oct 14 '22 18:10

Juuso Ohtonen

Related questions
                            
                                Get Unicode characters with charcode values greater hex `FFFF`
                            
                                read/write unicode data in MySql
                            
                                Does an nvarchar always store each character in two bytes?
                            
                                Python efficient obfuscation of string
                            
                                Do I need to set ini_set( 'default_charset', 'UTF-8' );?
                            
                                Print unicode character with code bigger than four hex digits
                            
                                Go lang's equivalent of charCode() method of JavaScript
                            
                                How to Display Emoticons/Emoji in Snackbar or Toast / Textview [duplicate]
                            
                                How to get the Unicode code point for a character in Javascript?
                            
                                Why is ñ changing to Ã±?
                            
                                Python 3: Demystifying encode and decode methods
                            
                                UnicodeEncodeError: 'ascii' codec can't encode characters
                            
                                python, regex split and special character
                            
                                UTF-8 character gets changed in INSERT statement on MS-SQL-Server
                            
                                Error writing a file with file.write in Python. UnicodeEncodeError
                            
                                How to solve UnicodeDecodeError in Python 3.6?
                            
                                getting bytes from unicode string in python
                            
                                Is there an html special character for a down-right arrow?
                            
                                replace emoji unicode symbol using regexp in javascript
                            
                                Unicode characters in emacs term-mode

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Why do we need both UCS and Unicode character sets? [closed]

Tags:

unicode

ucs

Lunar Mushrooms

People also ask

2 Answers

Jukka K. Korpela

Juuso Ohtonen

Recent Activity

Donate For Us