Ligatures are the Unicode characters which are represented by more than one code points. For example, in Devanagari त्र
is a ligature which consists of code points त + ् + र
.
When seen in simple text file editors like Notepad, त्र
is shown as त् + र
and is stored as three Unicode characters. However when the same file is opened in Firefox, it is shown as a proper ligature.
So my question is, how to detect such ligatures programmatically while reading the file from my code. Since Firefox does it, there must exist a way to do it programmatically. Are there any Unicode properties which contain this information or do I need to have a map to all such ligatures?
SVG CSS property text-rendering
when set to optimizeLegibility
does the same thing (combine code points into proper ligature).
PS: I am using Java.
EDIT
The purpose of my code is to count the characters in the Unicode text assuming a ligature to be a single character. So I need a way to collapse multiple code points into a single ligature.
The Computer Typesetting wikipedia page says -
The Computer Modern Roman typeface provided with TeX includes the five common ligatures ff, fi, fl, ffi, and ffl. When TeX finds these combinations in a text it substitutes the appropriate ligature, unless overridden by the typesetter.
This indicates that it's the editor that does substitution. Moreover,
Unicode maintains that ligaturing is a presentation issue rather than a character definition issue, and that, for example, "if a modern font is asked to display 'h' followed by 'r', and the font has an 'hr' ligature in it, it can display the ligature."
As far as I see (I got some interest in this topic and just now reading few articles), the instructions for ligature substitute is embeded inside font. Now, I dug into more and found these for you; GSUB - The Glyph Substitution Table and Ligature Substitution Subtable from the OpenType file format specification.
Next, you need to find some library which can allow you to peak inside OpenType font files, i.e. file parser for quick access. Reading the following two discussions may give you some directions in how to do these substitutions:
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With