Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

fatal error: high- and low-surrogate code points are not valid Unicode scalar values [duplicate]

Sometimes while initializing a UnicodeScalar with a value like 57292 yields the following error:

fatal error: high- and low-surrogate code points are not valid Unicode scalar values

What is this error, why does it occur and how can I prevent it in the future?

like image 959
Vatsal Manot Avatar asked Aug 22 '15 16:08

Vatsal Manot


1 Answers

Background: UTF-16 represents a sequence of Unicode characters ("code points") as a sequence of 16-bit "code units". For characters whose scalar values fit within 16 bits (i.e., those from U+0000 to U+FFFF), the code unit has the same value as the character; but for characters outside that range (those from U+10000 to U+10FFFF), UTF-16 has to use two code units. To make this work, Unicode reserves a range of code-points (U+D800 to U+DFFF) as "surrogates", which cannot be used as characters; UTF-16 can then use two of these surrogates together to represent a code point outside the 16-bit range. (The "high" and "low" refer to surrogates that serve as the first and second code units in these pairs, respectively. Each surrogate is either a high surrogate or a low surrogate, but not both; experience with older character sets had shown that it's very useful to always be able to tell where one character ends and the next begins.)

So the issue you're seeing is that you're trying to create a UnicodeScalar with a value (U+DFCC) that, according to the Unicode standard, is reserved to not be a Unicode scalar. U+DFCC is defined not to exist, and is just a "surrogate" for half of a scalar that does exist.

To prevent this issue, you need to stick to scalars that do exist — U+0000 to U+D7FF and U+E000 to U+10FFFF.

like image 187
ruakh Avatar answered Sep 19 '22 23:09

ruakh