Some fonts on Google Web Fonts support multiple "character sets". The thing is, if the web font I use only serves the "latin" glyphs, users who translate the page to a language whose glyphs aren't supported will clearly notice the messed up text.
I'd like my web fonts to support the most popular languages in the world aside from English, for example, Spanish, German, French, etc.
For this purpose, I'd like to know, which languages exactly, the "latin" and "latin-extended" cater to, individually.
I expect the answer to look like:
Latin Character Set & Supported Languages: - .......... - .......... - .......... Latin-Extended Character Set & Supported Languages: - .......... - .......... - ..........
I couldn't find this info in Google Web Fonts documentation, or by Googling.
Latin Extended on Google fonts means practically block Latin-Extended-A (U+0100 to U+017F) which should (combined with "Latin") support all European based latin-written texts.
Cyrillic Extended-A is a Unicode block containing combining Cyrillic letters used in Old Church Slavonic texts.
aka Unicode Latin1-Supplement (U+0080 to U+00FF) is meant to support primarily Western European languages (as you mentioned French, German, Spanish, also Portuguese, Italian, Irish, Icelandic, languages of Scandinavian countries and unintentionally also other languages mentioned in the list below). English is supported by standard ASCII. ASCII (first 127 chars, 95 of them are graphemes U+0020 to U+007E) was placed as the very first block in Unicode named Basic Latin. This block is considered as a part of "Latin" and is usually supported even in non-latin fonts allowing them to be used as system fonts (most non-localized low-level programs have ASCII hardcoded).
Latin Extended on Google fonts means practically block Latin-Extended-A (U+0100 to U+017F) which should (combined with "Latin") support all European based latin-written texts. Internet emerged in the USA, so ASCII was its native code. Then ISO-8859-1 (Latin1) standard for upper half of 8bit codepages was defined to support Western Europe, which was transformed to Latin1-Supplement Unicode block. Other 8bit ISO-8859 European Latin standards (Latin 2 East, Latin 3 South, Latin 4 North) were merged and moved to Latin-Extended-A block. These Latin standards shared many characters with Latin 1, so almost all European languages (except for Maltese, Latvian, Lithuaian) in "Latin-Extended" range requires also Latin1-Supplement. This means that "Latin-Extended" font is usually but not necessarily superset of "Latin" category.
In Unicode, there is also Latin-Extended-B block which added support mostly for non-European Latin alphabets, Azeri Ə and Romanian Ș, Ț (to fix previous mistake), but these characters are often replaced with Ä, Ş, Ţ from Extended-A (albeit my Romanian friend told me that it is unacceptable substitute). Support also includes Vietnamese Ơ, Ư (but this has its own category on Google fonts) and some African languages, which also require Latin-Extended-Additional block.
African Latin languages are mostly not supported by Google's Latin Extended category (the list of compatible Google fonts is below). There are even more exotic C, D and E extensions (252 characters total) containing outdated and today mostly useless letters and symbols. This table sums this up (not 100% correct, just to get the idea of the blocks main intention):
-------------------------------------------------------------------- | Unicode Latin Set | Latin Support | Google Name | |==================================================================| | Basic Latin (aka ASCII) | English | | | Latin1-Supplement | Western European | Latin | |------------------------------------------------------------------| | Latin Extended A | European based | Latin Extended | |------------------------------------------------------------------| | Latin Extended B | non-European | Vietnamese | |------------------------------------------------------------------| | Latin Extended Additional | African | | |------------------------------------------------------------------| | Latin Extended C, D, E | Historical, Exotic | | --------------------------------------------------------------------
Most authors create their font by Unicode blocks, some of them to support only chosen languages. If the languages contain some characters from Latin Extended A block, Google places it into Latin Extended category. For example, Lato font supports only Polish characters (the author is a Pole), yet it is in Google's "Latin Extended" category and there is no information about it on the web. (There is now Glyphs tab in font details, but it doesn't display all glyphs in font.)
The "language" filter on Google fonts is rather confused and unclear: It contains Devanagari (which is not a language, but writing system and Unicode block), "Latin" and "Latin Extended" (which are not languages, but Google's pseudoblocks) and some languages that use some characters from other blocks. There is no clear separation to distinguish block support and language support there, nor if the support is full or partial. For time being, the only way to find this out is to try to display the characters from the list below.
From the list of latin-written alphabets below inspected on Omniglot and other sources, I do not count:
Please comment if something important is missing or if some minority language is used in electronic communication. Bolds are official major country-wide languages. In this list there are languages spoken by at least hundreds of thousand people.
ASCII (Basic Latin, often supported even in non-latin fonts)
Clasical Latin, Aymara (Bolivia) Afrikaans (south Africa), Asturian (Spain), Corsu (France), Dutch, Fijian, English, Greenlandic, Gaelic (Scotland), Gilbertese (Kiribati), Haitian, Hiligaynon (Philippines), Lombard (Italy), Malay, Shona (Zimbabwe), Sicilian, Swahili (central Africa).
Latin
Latin Extended
Latin Extended, African (mostly not supported in Latin-Extended fonts). Full support of Africa alphabet has Ubuntu, Fira Sans, EB Garamond, Tinos, News Cycle, Didact Gothic, M Plus, Sawarabi, Cousine, Caudex, Judson, Andika (and of course Noto, see below)
Alternatively, the font may support the Combining Diacritical Marks block: U+0300 to U+036F. For example, Ř can be typed either as U+0158 (aka precomposed character) or as R + U+030C. Program supporting Unicode should both display and treat the same and provide some API to deal with it - like String.normalize() to decompose diacritics - but if the program or font doesn't support repertoire, the combining diacritical mark might end up a bit misplaced (like too low umlaut on Ɛ̈ it seems to get fixed in this font), see this very detailed Unicode Q&A on this topic.
Many Latin fonts support some characters outside of Latin scope, as they are common in Latin texts, namely:
If your font doesn't support them, I recommend to try and see how it combines with fallback font like in this sentence (to copy and paste incl. the bullet sign)
• “We sell ‘cheap’ capacitors in range μF–mF, 2€ per pack”
You might want to customize some fonts (if their licence allows it) by Font Squirrel service or use them as a backup.
Fonts with extensive amount of characters:
If you really like some font that lacks support of some diacritics, it is quite easy to add the support using Font Forge. In that case read the font license carefully: from the legal point of view, font is software.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With