Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Which system component is responsible for binding Unicode ligatures in a Java application?

This is a "meta-question" which I came across when trying to find a better specification for another of my questions (Rendering Devanagari ligatures (Unicode) in Java Swing JComponent on Mac OS X).

What I don't quite understand as of yet is which "component" (for want of a better word) of a given system is responsible for displaying Unicode text in Java, and more specifically ligatures.

As far as I understand, the following components have an influence on the process:

  1. The system character encoding (which for example is UTF-8 on Mac OS X 10.6, UTF-16 on Windows 7 (according to akira's comment on this superuser.com post)).
  2. The Java Charset (which by default is MacRoman on Mac OS X 10.6, cp1252 on Windows 7).
  3. The font that is used to render the text, and that font's encoding information (as suggested by Donal Fellows on my other question:

    "fonts include information about what encoding they're using".

  4. Obviously whether the characters to render are present at the respective Unicode code points.

So if a string of Unicode characters doesn't display correctly (as seen in my other question, s.a.), where would the problem most probably be? I.e., what "component" (what would a better word be?) is responsible for "binding" the ligature, its composition?

Thank you very much in advance and please let me know should you need more information.

like image 557
s.d Avatar asked May 17 '11 14:05

s.d


1 Answers

That system component is called a font renderer or font rasterizer. It is responsible for converting a sequence of character codes into pixels based on glyphs defined in a font. As other answers have stated, the various character encoding values you can get and set from Java are irrelevant. When the JVM gives the font renderer a sequence of character codes, it tells it what encoding applies (probably UTF16, but this is transparent to the Java programmer.) The font renderer uses the font encoding specified in the font file to match up the corresponding glyphs.

Current versions of Windows and Mac OS X come with excellent font renderers.

The first point of confusion is that the JRE comes with its own font renderer, as part of the Java2D platform, and this is what Swing uses. There ought to be an option to control whether Java uses its own renderer or the system one.

EDIT: As McDowell pointed out in a comment, on OS X you can enable the system renderer by setting the Java property apple.awt.graphics.UseQuartz=true.

The second point of confusion is that ligatures are optional in English. A desktop publishing application will substitute an "ffl" ligature (a single glyph in the font) when it sees a word like "shuffle", but most other applications don't bother. Based on what you've said about Devanagari (and what I just read on Wikipedia) I gather the ligatures are not optional in that language.

By default, the Java2D font renderer does not do ligatures. However, the JavaDoc for java.awt.font.TextAttribute.LIGATURES says that ligatures are always enabled for writing systems that require them. If that isn't your experience, you may have found a bug in the Java2D font renderer. Meanwhile, try using the Font constructor that takes a map of font attributes, including TextAttribute.LIGATURES.

like image 116
gatkin Avatar answered Oct 19 '22 03:10

gatkin