Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is ISO-8859-1 a Unicode charset?

I have been attending a lecture on XML where it was written "ISO-8859-1 is a Unicode format". It sounds wrong to me, but as I research on it, I struggle understanding precisely what Unicode is.

Can you call ISO-8859-1 a Unicode format ? What can you actually call Unicode ?

like image 484
mdup Avatar asked Oct 09 '12 07:10

mdup


People also ask

Is ISO 8859 the same as UTF-8?

UTF-8 is a multibyte encoding that can represent any Unicode character. ISO 8859-1 is a single-byte encoding that can represent the first 256 Unicode characters. Both encode ASCII exactly the same way.

What is the difference between ASCII and ISO-8859-1?

ISO 8859 is an eight-bit extension to ASCII developed by ISO (the International Organization for Standardization). ISO 8859 includes the 128 ASCII characters along with an additional 128 characters, such as the British pound symbol and the American cent symbol.

What is encoding ISO 8859?

ISO/IEC 8859-1 encodes what it refers to as "Latin alphabet no. 1", consisting of 191 characters from the Latin script. This character-encoding scheme is used throughout the Americas, Western Europe, Oceania, and much of Africa.

What type of encoding is Unicode?

Unicode uses two encoding forms: 8-bit and 16-bit, based on the data type of the data that is being that is being encoded. The default encoding form is 16-bit, where each character is 16 bits (2 bytes) wide. Sixteen-bit encoding form is usually shown as U+hhhh, where hhhh is the hexadecimal code point of the character.


1 Answers

ISO 8859-1 is not Unicode

ISO 8859-1 is also known as Latin-1. It is not directly a Unicode format.

However, it does have the unique privilege that its code points 0x00 .. 0xFF map one-to-one to the Unicode code points U+0000 .. U+00FF. So, the first 256 code points of Unicode, treated as 1 byte unsigned integers, map to ISO 8859-1.


Control characters

Peregring-lk observes that ISO 8859-1 does not define the control codes. The Unicode charts for U+0000..U+007F and U+0080..U+00FF suggest that the C0 controls found in positions U+0000..U+001F and U+007F come from ISO/IEC 6429:1992 and the C1 controls found in positions U+0080..U+9F likewise. Wikipedia on the C0 and C1 controls suggests that the standard is ISO/IEC 2022 instead. Note that three of the C1 controls do not have a formal name.

In general parlance, the control code points of the ISO 8859-1 code set are assumed to be the C0 and C1 controls from ISO 6429 (or 2022).

like image 90
Jonathan Leffler Avatar answered Oct 16 '22 06:10

Jonathan Leffler