Given this area of the Unicode table, for instance: <pre class="prettyprint"><code> ... 𝑎 U+1D44E Dec:119886 MATHEMATICAL ITALIC SMALL A &#x1D44E; 𝑏 U+1D44F Dec:119887 MATHEMATICAL ITALIC SMALL B &#x1D44F; 𝑐 U+1D450 Dec:119888 MATHEMATICAL ITALIC SMALL C &#x1D450; 𝑑 U+1D451 Dec:119889 MATHEMATICAL ITALIC SMALL D &#x1D451; 𝑒 U+1D452 Dec:119890 MATHEMATICAL ITALIC SMALL E &#x1D452; 𝑓 U+1D453 Dec:119891 MATHEMATICAL ITALIC SMALL F &#x1D453; 𝑔 U+1D454 Dec:119892 MATHEMATICAL ITALIC SMALL G &#x1D454; 𝑖 U+1D456 Dec:119894 MATHEMATICAL ITALIC SMALL I &#x1D456; # what?! 𝑗 U+1D457 Dec:119895 MATHEMATICAL ITALIC SMALL J &#x1D457; 𝑘 U+1D458 Dec:119896 MATHEMATICAL ITALIC SMALL K &#x1D458; 𝑙 U+1D459 Dec:119897 MATHEMATICAL ITALIC SMALL L &#x1D459; 𝑚 U+1D45A Dec:119898 MATHEMATICAL ITALIC SMALL M &#x1D45A; 𝑛 U+1D45B Dec:119899 MATHEMATICAL ITALIC SMALL N &#x1D45B; 𝑜 U+1D45C Dec:119900 MATHEMATICAL ITALIC SMALL O &#x1D45C; ... </code></pre> I would naturally expect u+1d455 to be <code>MATHEMATICAL ITALIC SMALL H</code>. But it seems not defined on any table I look around. Why are there holes in Unicode table? (also U+1d49d, u+1d53a, etc.) Is there any way I can fill them? <hr> [EDIT]: These links do state: <blockquote> The "holes" in the alphabetic ranges are filled by previously defined characters in the Letter like Symbols block shown below. </blockquote> and <blockquote> The Unicode Consortium adds new codepoints to the standard all the time. Visit their website to find out about pending codepoints and whether this one is in the pipe. The following table shows typical representations of how the codepoint would look, if it existed. This may help you when debugging, but is not of real use otherwise. </blockquote> But I just... don't understand what they mean :\

From the comments (cheers guys), I have learnt that these holes are due to some characters being already assigned in Unicode when the whole alphabet had been added. For instance: before <code>U+1D4* MATHEMATICAL ITALIC SMALL *</code> identifiers were defined, <code>ℎ</code> was already known in the table as <pre class="prettyprint"><code>ℎ U+210E Dec:008462 PLANCK CONSTANT &planckh; # here it is </code></pre> So in order to keep consistency in numbering but NOT to duplicate <code>ℎ</code> id, a hole has been inserted at <code>U+1D455</code> position. <hr> Similarly, <code>ℬ</code> is known as <code>U+212C SCRIPT CAPITAL B</code> rather than <code>U+1D49D - - - reserved</code> in the <code>MATHEMATICAL SCRIPT CAPITAL</code> letters family. Similarly, <code>ℂ</code> from <code>MATHEMATICAL DOUBLE-STRUCK CAPITAL</code> letters family is not <code>U+1D53A</code> because it was already known as <code>U+2102 DOUBLE-STRUCK CAPITAL C</code>. This was a difficult choice, having to deal with retro-compatibility, consistency and reliability altogether :)

Why are there holes in the Unicode table?

Tags:

character-encoding

unicode

utf-8

standards

Given this area of the Unicode table, for instance:

  ...
𝑎    U+1D44E Dec:119886       MATHEMATICAL ITALIC SMALL A &#x1D44E;
𝑏    U+1D44F Dec:119887       MATHEMATICAL ITALIC SMALL B &#x1D44F;
𝑐    U+1D450 Dec:119888       MATHEMATICAL ITALIC SMALL C &#x1D450;
𝑑    U+1D451 Dec:119889       MATHEMATICAL ITALIC SMALL D &#x1D451;
𝑒    U+1D452 Dec:119890       MATHEMATICAL ITALIC SMALL E &#x1D452;
𝑓    U+1D453 Dec:119891       MATHEMATICAL ITALIC SMALL F &#x1D453;
𝑔    U+1D454 Dec:119892       MATHEMATICAL ITALIC SMALL G &#x1D454;
𝑖    U+1D456 Dec:119894       MATHEMATICAL ITALIC SMALL I &#x1D456; # what?!
𝑗    U+1D457 Dec:119895       MATHEMATICAL ITALIC SMALL J &#x1D457;
𝑘    U+1D458 Dec:119896       MATHEMATICAL ITALIC SMALL K &#x1D458;
𝑙    U+1D459 Dec:119897       MATHEMATICAL ITALIC SMALL L &#x1D459;
𝑚    U+1D45A Dec:119898       MATHEMATICAL ITALIC SMALL M &#x1D45A;
𝑛    U+1D45B Dec:119899       MATHEMATICAL ITALIC SMALL N &#x1D45B;
𝑜    U+1D45C Dec:119900       MATHEMATICAL ITALIC SMALL O &#x1D45C;
  ...

I would naturally expect u+1d455 to be MATHEMATICAL ITALIC SMALL H. But it seems not defined on any table I look around.

Why are there holes in Unicode table? (also U+1d49d, u+1d53a, etc.)
Is there any way I can fill them?

[EDIT]: These links do state:

The "holes" in the alphabetic ranges are filled by previously defined characters in the Letter like Symbols block shown below.

and

The Unicode Consortium adds new codepoints to the standard all the time. Visit their website to find out about pending codepoints and whether this one is in the pipe. The following table shows typical representations of how the codepoint would look, if it existed. This may help you when debugging, but is not of real use otherwise.

But I just... don't understand what they mean :\

910

asked Nov 09 '17 15:11

iago-lito

1 Answers

From the comments (cheers guys), I have learnt that these holes are due to some characters being already assigned in Unicode when the whole alphabet had been added.

For instance: before U+1D4* MATHEMATICAL ITALIC SMALL * identifiers were defined, ℎ was already known in the table as

ℎ    U+210E Dec:008462        PLANCK CONSTANT &planckh; # here it is

So in order to keep consistency in numbering but NOT to duplicate ℎ id, a hole has been inserted at U+1D455 position.

Similarly, ℬ is known as U+212C SCRIPT CAPITAL B rather than U+1D49D - - - reserved in the MATHEMATICAL SCRIPT CAPITAL letters family.

Similarly, ℂ from MATHEMATICAL DOUBLE-STRUCK CAPITAL letters family is not U+1D53A because it was already known as U+2102 DOUBLE-STRUCK CAPITAL C.

This was a difficult choice, having to deal with retro-compatibility, consistency and reliability altogether :)

answered Sep 23 '22 12:09

iago-lito

Related questions
                            
                                Can I turn off implicit Python unicode conversions to find my mixed-strings bugs?
                            
                                How can I check a Python unicode string to see that it *actually* is proper Unicode?
                            
                                Finding Unicode character name with Javascript
                            
                                Java: How to create unicode from string "\u00C3" etc
                            
                                C++: Making my project support unicode
                            
                                Send a non-ASCII POST request in Python?
                            
                                Python 2.X: Why Can't I Properly Handle Unicode?
                            
                                How to convert array of tamil unicode values into tamil string in python with whitespaces?
                            
                                Jquery inserting unicode instead of symbol
                            
                                Python, .format(), and UTF-8
                            
                                determine whether a unicode character is fullwidth or halfwidth in C++
                            
                                Convert numeric character reference notation to unicode string
                            
                                How to rename a file with non-ASCII character encoding to ASCII
                            
                                How to validate a unicode email?
                            
                                Unicode character-specific CSS - a thought
                            
                                Is Python 3.3 better than 2.7 for Decoding and Re-Encoding Scraped Web Text to UTF-8?? Like, a lot better?
                            
                                Why can I not use the Unicode characters √ and ∀ in assignments?
                            
                                ascii codec cant decode byte 0xe9
                            
                                Why python 2.7 on Windows need a space before unicode character when print?
                            
                                Regex for accent insensitive replacement in python

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Why are there holes in the Unicode table?

Tags:

character-encoding

unicode

utf-8

standards

iago-lito

People also ask

1 Answers

iago-lito

Recent Activity

Donate For Us