Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Numeric equivalence between HTML Character Entities and Delphi?

The HTML Character Entity 𝕒:

enter image description here

can be created from the number 120146 with this HTML code:

<!DOCTYPE html>
<html>
<style>
body {
    font-size: 20px;
}
</style>
<body>

<p>I will display &#120146;</p>

</body>
</html>

Some of these extended character symbols can be created from the identical number value both in HTML and in Delphi 10.1.2. For example:

Both &#174; and Chr(174) create the "registered trademark" symbol character ®

Both &#163; and Chr(163) create the "pound" symbol character £

Etc.

Unfortunately, this is not the case with the above number 120146 where Chr(120146) in Delphi creates a "funny Chinese symbol".

So how can I create the above &aopf; character symbol from the number 120146 in Delphi? And which is the numeric range where the above numeric equivalence between HTML and Delphi does work or does not work?

like image 626
user1580348 Avatar asked Sep 21 '17 19:09

user1580348


1 Answers

This is 'MATHEMATICAL DOUBLE-STRUCK SMALL A' (U+1D552). It is outside the Basic Multilingual Plane, and so in UFT-16 is encoded using a surrogate pair. Which means that two UTF-16 character elements are required.

Look at your attempt: Chr(120146). Now, 120146 > high(Word) (= 65535) which tells you that your code cannot succeed. Remember that each UTF-16 character element is 16 bits in size. It would be nice if the compiler warned about this. Does it?

The link above tells you how to encode it. It is given by this surrogate pair:

0xD835 0xDD52

In Delphi that would be most easily written as:

#$D835#$DD52

If you are starting with the UTF-32 code as a numeric value then you can convert it to a Delphi string using TCharacter.ConvertFromUtf32 from the System.Character unit:

TCharacter.ConvertFromUtf32($1D552)

Obviously the argument to this function can be a variable.

If much of the above Unicode terminology is unknown to you, read these articles:

  • The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)
  • A Programmer’s Introduction to Unicode, Nathan Reed.
like image 145
David Heffernan Avatar answered Sep 19 '22 04:09

David Heffernan