Why does UTF-8 use more than one byte to represent some characters?

Question

I recently went through an article on Character Encoding. I've a concern on a certain point mentioned there.

In the first figure, the author shows the characters, their code points in various character sets and how they are encoded in various encoding formats. For example the code point of é is E9. In ISO-8859-1 encoding it is represented as E9. In UTF-16 it is represented as 00 E9. But in UTF-8 it is represented using 2 bytes, C3 A9.

My question is why is this required? It can be represented with 1 byte. Why are two bytes used? Can you please let me know?

Bohemian · Accepted Answer

UTF-8 uses the 2 high bits (bit 6 and bit 7) to indicate if there are any more bytes: Only the low 6 bits are used for the actual character data. That means that any character over 7F requires (at least) 2 bytes.

Why does UTF-8 use more than one byte to represent some characters?

Tags:

character-encoding

utf-8

Apps

1 Answers

Bohemian

Recent Activity

Donate For Us

Why does UTF-8 use more than one byte to represent some characters?

Tags:

character-encoding

utf-8

Apps

1 Answers

Bohemian

Related questions

Recent Activity

Donate For Us