I'm interested in knowing where font fallback fits in the font shaping/rendering stack. In other words, at what point are missing glyphs detected and how are they substituted?
I see in this document that the FontConfig tool does font fallback "based on glyph coverage transparently."
So the questions are:
Edit: I found this document which explains the "what" of FontConfig, but not the "how." Question 1 is about the "how."
To summarize - this post really has to do with one thing only - how does font fallback work when glyphs are missing in a font.
Font fallback in browsers (as opposed to, say, in an OS) is based on two things:
The CSS spec is fairly trivial in this respect, simply giving the list of fonts using their system names, but several possible "catch all" fonts that are in no way guaranteed to be the same from computer to computer (there is no reason to assume that serif
maps to Times
or Times New Roman
, for instance).
The fallback algorithm used by text engines is entirely up to the engine, but usually kicks in during the glyph lookup step: the text engine sees a string of code points, and tries to use a font to shape that string. For each point in the sequence, it checks whether the font has a matching glyph (by consulting the CMAP table and subtables), or a rule that tells the engine that there may be a glyph to use only if more code points follow, through the GSUB mechanism (For instance, a font without glyphs for the individual letters e
, t
and c
, but with a glyph for &
and a GSUB rule that says the sequence e
+t
+c
should be in-text replaced with the single glyph &
), and when it's finished accumulating this kind of "unit of points", it shapes the text and hands it back to whatever asked it to shape text.
If, during glyph lookup, it turns out the font doesn't contain anything that lets the engine shape a particular code point (i.e. running through the CMAP data as well as the GSUB rules still shows "there is no glyph") then the text engine can do two things:
.notdef
outline defined as glyph id 0, and generally give you text with lovely empty boxes (lovingly called "tofu" by font folks) or question marks.When using fallback, an engine can go down a list of alternative fonts until either: (a) a glyph is found, or (b) the list is exhausted, at which point the engine has to give up, and will use the .notdef
glyph. Whether the engine grabs the .notdef
glyph from the original font, or from the last font in the list, is entirely up to the engine (although usually it'll go with the first font, for legibility)
There is no "standard" algorithm for this defined anywhere; font fallback is basically a convenience mechanism offered by text engine authors, like how browsers come with bookmark managers (handy, and not part of any spec). As far as OpenType is concerned, there are no requirements on whether an engine should just serve up .notdef
when a glyph is not found, or whether it should serve up the part it could shape, then find the missing glyph somewhere else, and render text that way. CSS implies that your text engine should have at least some form of font fallback, but it doesn't specify how it should work, or when it should kick in.
On Windows:
Firefox has different algorithm for CJK glyphs and non-CJK glyphs:
non-CJK algorithm is very simple: try all the configured fonts of the given html language. These include both config font.name.{generic}.{language}
and the list of config font.name-list.{generic}.{language}
.
CJK is by nature complicated due to the shear number of glyphs, encodings and language variations. Firefox uses a dynamic search algorithm to resolve the glyphs.
ja
) fonts.ko
) fonts.zh-CN
) fonts.zh-HK
) fonts.zh-TW
) fonts.The algorithm is currently implemented in GetLangPrefs(). In both CJK and non-CJK cases, there is a limit of how many fonts to be searched (32). The script search order is hard coded and thus can't be user configured at the moment.
The advantage of Firefox's fallback algorithm is that, thanks to its dynamic nature, more fonts are searched thus minimizing the chance of user encountering missing glyphs. Additionally, by understanding the search order, users can manipulate the configuration to choose desired fonts for missing glyphs.
The disadvantage is inconsistency: because the search list is hard coded, fonts from certain languages are prioritized for all webpages. For instance, Japanese optimized fonts might be used in tag-missing Korean webpages. Also, since more fonts are tried, the performance might deteriorate.
Unlike Firefox, Chromium chooses a more static approach to search fonts. Instead of dividing CJK cases and going through font list, Chromium hard codes several "core" fonts for each script. Chromium assumes these fonts should always be available, thus only search these fonts. The mapping of script to font can be found in InitializeScriptFontMap(). This mapping cannot be user configured at the moment.
The advantage of this algorithm is simplicity, consistency and performance, at the cost of flexibility and configurability.
The implementation may change in the future. More detail in https://gist.github.com/CrendKing/c162f5a16507d2163d58ee0cf542e695.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With