Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is byte 0xFF valid in a UTF-8 encoded string?

Tags:

utf-8

Can an UTF-8 string contain the byte 0xFF (255)?

like image 954
Guillaume Brunerie Avatar asked Apr 14 '11 01:04

Guillaume Brunerie


People also ask

What characters are not allowed in UTF-8?

0xC0, 0xC1, 0xF5, 0xF6, 0xF7, 0xF8, 0xF9, 0xFA, 0xFB, 0xFC, 0xFD, 0xFE, 0xFF are invalid UTF-8 code units. A UTF-8 code unit is 8 bits.

What is an invalid UTF-8 string?

This error is created when the uploaded file is not in a UTF-8 format. UTF-8 is the dominant character encoding format on the World Wide Web. This error occurs because the software you are using saves the file in a different type of encoding, such as ISO-8859, instead of UTF-8.

What is byte 0xff?

The signed byte 0xff represents the value -1 . This is because Java uses two's complement to represent signed values. The signed byte 0xff represents -1 because its most significant bit is 1 (so therefore it represents a negative value) and its value is -128 + 64 + 32 + 16 + 8 + 4 + 2 + 1 = -1 .

How many bytes is a string in UTF-8?

UTF-8 is based on 8-bit code units. Each character is encoded as 1 to 4 bytes.


1 Answers

No. It is specifically forbidden by the spec.

like image 144
Dour High Arch Avatar answered Sep 30 '22 03:09

Dour High Arch