Given this area of the Unicode table, for instance:
...
š U+1D44E Dec:119886 MATHEMATICAL ITALIC SMALL A 𝑎
š U+1D44F Dec:119887 MATHEMATICAL ITALIC SMALL B 𝑏
š U+1D450 Dec:119888 MATHEMATICAL ITALIC SMALL C 𝑐
š U+1D451 Dec:119889 MATHEMATICAL ITALIC SMALL D 𝑑
š U+1D452 Dec:119890 MATHEMATICAL ITALIC SMALL E 𝑒
š U+1D453 Dec:119891 MATHEMATICAL ITALIC SMALL F 𝑓
š U+1D454 Dec:119892 MATHEMATICAL ITALIC SMALL G 𝑔
š U+1D456 Dec:119894 MATHEMATICAL ITALIC SMALL I 𝑖 # what?!
š U+1D457 Dec:119895 MATHEMATICAL ITALIC SMALL J 𝑗
š U+1D458 Dec:119896 MATHEMATICAL ITALIC SMALL K 𝑘
š U+1D459 Dec:119897 MATHEMATICAL ITALIC SMALL L 𝑙
š U+1D45A Dec:119898 MATHEMATICAL ITALIC SMALL M 𝑚
š U+1D45B Dec:119899 MATHEMATICAL ITALIC SMALL N 𝑛
š U+1D45C Dec:119900 MATHEMATICAL ITALIC SMALL O 𝑜
...
I would naturally expect u+1d455 to be MATHEMATICAL ITALIC SMALL H
. But it seems not defined on any table I look around.
Why are there holes in Unicode table? (also U+1d49d, u+1d53a, etc.)
Is there any way I can fill them?
[EDIT]: These links do state:
The "holes" in the alphabetic ranges are filled by previously defined characters in the Letter like Symbols block shown below.
and
The Unicode Consortium adds new codepoints to the standard all the time. Visit their website to find out about pending codepoints and whether this one is in the pipe. The following table shows typical representations of how the codepoint would look, if it existed. This may help you when debugging, but is not of real use otherwise.
But I just... don't understand what they mean :\
ASCII cannot be used to encode the many types of characters found around the world. Unicode was extended further to UTF-16 and UTF-32 to encode the various types of characters. Therefore, the significant difference between ASCII and Unicode is the number of bits used to encode.
The Unicode Standard is intended to support the needs of all types of users, whether in business or academia, using mainstream or minority scripts. Q: How many characters are in Unicode? The short answer is that as of Version 15.0, the Unicode Standard contains 149,186 characters.
The code point is a unique number for a character or some symbol such as an accent mark or ligature. Unicode supports more than a million code points, which are written with a "U" followed by a plus sign and the number in hex; for example, the word "Hello" is written U+0048 U+0065 U+006C U+006C U+006F (see hex chart).
From the comments (cheers guys), I have learnt that these holes are due to some characters being already assigned in Unicode when the whole alphabet had been added.
For instance: before U+1D4* MATHEMATICAL ITALIC SMALL *
identifiers were defined, ā
was already known in the table as
ā U+210E Dec:008462 PLANCK CONSTANT ℎ # here it is
So in order to keep consistency in numbering but NOT to duplicate ā
id, a hole has been inserted at U+1D455
position.
Similarly, ā¬
is known as U+212C SCRIPT CAPITAL B
rather than U+1D49D - - - reserved
in the MATHEMATICAL SCRIPT CAPITAL
letters family.
Similarly, ā
from MATHEMATICAL DOUBLE-STRUCK CAPITAL
letters family is not U+1D53A
because it was already known as U+2102 DOUBLE-STRUCK CAPITAL C
.
This was a difficult choice, having to deal with retro-compatibility, consistency and reliability altogether :)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With