Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Unicode codepoints to font symbol mapping?

Tags:

unicode

fonts

We know code point 65 represent A. There is one to one mapping, hence it's easy to render. Take symbol A from the font file and render it.

65 == A

Now let's consider Hindi language. Code point 0x0924 represents , again easy to map.

0x0924 == त

But, if code point 0x0924 is immediately followed by code points 0x094d and 0x0930 which represents and respectively, the representation is not combination of these 3 but a new symbol त्र

0x0924 0x094d 0x0930 != त ् र

But

0x0924 0x094d 0x0930 == त्र

My question are.

  1. Which program is responsible to indicate the grouping the code points will create new Symbol, but not concat of symbols.
  2. While typing, does the same program monitors and the input to dynamically change already rendered symbol when new code point is appended due to typing
  3. How fonts are created for such rules?
like image 268
Pavan Kumar Avatar asked Feb 21 '16 19:02

Pavan Kumar


1 Answers

Welcome to modern fonts: they're not what you think. The days of "one codepoint maps to one letter" are kind of 20 years ago, modern fonts have -for the last few decades- been doing way more than that. I'm going to explain this in terms of OpenType fonts (what most people call "ttf" and "otf" fonts. Yes, those are the same font, they only differ in their glyph outline encoding, which is kind of the least notable part about a modern font), because that's the kind you're most likely using, in which case: the font pretty much controls everything, and the text engine you're relying on is simply following its instructions.

OpenType fonts have a "Character Map" that provides (all) the simple one-to-one mapping(s) from input byte code to some glyph ("shape") somewhere in the list of available glyphs. (Note that this does not define "which glyphs exist" for the font, it only says which glyphs are directly matched to individual character codes such as individual ASCII bytes or Unicode codepoints. There can be thousands more glyphs that are used for compositing, or multi-codepoint substitution, etc that cannot be resolved through the character map).

Also, one font can, and usually does, contain more than one mapping, because different historical and current character sets (ASCII, EUC-KR, ISO2022-JP, Unicode, etc. etc. etc.) don't use the same codes for the same letters/symbols. If they share any at all, of course.

While mapping binary codes to other binary codes is trivially simple, the real power of modern fonts, particularly OpenType, is what happens next.

  1. OpenType has full ligature control, so just because code X yields glyph GX and code Y yields glyph GY, in no way means that X + Y will yield GX + GY. There are quite a few different kinds of ligatures possible (one-for-one, many-for-one, contextual, position-based, etc), and they're all controlled by the GSUB table, ("GSUB" for "G"lyph "SUB"stitution). When you type multiple hindi formants and they form a single "letter", that's GSUB's doing. If I type "f" + "i", for instance, there's a good chance that in a well designed font you see the single ligature fi. Similarly, if you're writing Arabic, where letters have different shapes depending on where in a word they are, that's also covered by GSUB. The GSUB table can contain hundreds of different rulesets to make sure the languages it's intended for all render correctly.
  2. Yes, but it's not a "program" so much as the font. Modern fonts are a bit like game ROMs in that you need an engine to execute them, but they call all the shots and contain all the logic. The text render engine simply goes "hey font, I have this byte sequence as input, please instruct me on how to turn that into outline vectors" and the font contains all the information on what needs to happen.
  3. "Using font software". This is kind of an obvious answer: good fonts are made with software that lets you do all the things you need to do in order for your intended language support to work. FontForge, FontStudio, FontCreator, etc. etc., with additional tools for optimizing all the OpenType features that a font needs (there are incredibly many).

Making good fonts, even just programming of them (so not taking the typeface design into account at all) is quite a specialised job.

like image 141
Mike 'Pomax' Kamermans Avatar answered Jan 04 '23 16:01

Mike 'Pomax' Kamermans