Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Get unicode character by glyph index in a CTFontRef or CGFontRef object

The CTFontRef provides excellent method such as CTFontGetGlyphsForCharacters for mapping character(s) to glyph(s). My question is, is there any method for invert mapping? That is say, can I get characters(s) by given glyph(s)? Since I found there is a CTFontCopyCharacterSet for getting all supported characters, I think there will be some nice solutions.

like image 829
cxa Avatar asked Feb 12 '11 10:02

cxa


2 Answers

TLDR: CTFont/CTFontRef/CTGlyph aren't sufficient - CTLine and CTRun need to get involved; and even then it's only meaningful if you have access to the original String->Glyph mapping.

I'm coming back to this a few years late in case others end up hitting this question. As alastair noted, there is no way to generically map glyphs back to characters. Simple examples - there are multiple unicode characters for 'space', often mapped to the same glyph. The same it often true for 'micro' and greek 'mu'.

However, it is sometimes (often?) the case that you have the original string and what you really want is to know how it was mapped to glyphs. In other words - I've got my string, and I've got the resulting glyphs - for each glyph index, what is the character index in the string it is contributing to. I wrote this sample to demonstrate a way to do this. (Aside: Lesson learned - Swift gets a little rough when working with some Core Foundation APIs)

import CoreText
import AppKit

func main(argc: Int, argv: [String])
{
    var stringAttributes: [String: AnyObject] = [:]
    var fontName = "Zapfino"
    var fUseLigatures = false

    var fontNameIndex = 0
    if argc > 1
    {
        if argv[1] == "/lig"
        {
            fUseLigatures = true;
            if (argc > 2) { fontNameIndex = 3 }
        }
        else { fontNameIndex = 2 }
    }

    if fontNameIndex > 0 { fontName = argv[fontNameIndex] }

    if let font = NSFont(name:fontName, size:24.0)
        { stringAttributes[NSFontAttributeName] = font }

    stringAttributes[NSLigatureAttributeName] = fUseLigatures ? 2 : 0

    let string = NSAttributedString(
    string:"This is \(fontName)!",
    attributes: stringAttributes)

    let line = CTLineCreateWithAttributedString(string) // CTLine

    let runs = CTLineGetGlyphRuns(line) // CTRun[]
    let nsRuns:Array<AnyObject> = runs as Array<AnyObject>
    assert(nsRuns.count == 1)

    let run = nsRuns[0] as! CTRun

    let glyphCount = CTRunGetGlyphCount(run)
    println("String: \(string.string)")
    println("\tStrLen: \(count(string.string)), Count Of Glyphs: \(glyphCount)");

    let clusters = UnsafeMutablePointer<CFIndex>.alloc(glyphCount)

    CTRunGetStringIndices(run, CFRange(location:0, length:glyphCount), clusters)

    for var idx = 0; idx < glyphCount; idx++
    {
        let idxString = clusters[idx];
        println("Glyph @ \(idx) maps to String @ \(idxString)")
    }
}

main(Process.arguments.count, Process.arguments)

If you run this without params and then with /lig at the command line you will get the following output:

    String: This is Zapfino!
        StrLen: 16, Count Of Glyphs: 16
Glyph @ 0 maps to String @ 0
Glyph @ 1 maps to String @ 1
Glyph @ 2 maps to String @ 2
Glyph @ 3 maps to String @ 3
Glyph @ 4 maps to String @ 4
Glyph @ 5 maps to String @ 5
Glyph @ 6 maps to String @ 6
Glyph @ 7 maps to String @ 7
Glyph @ 8 maps to String @ 8
Glyph @ 9 maps to String @ 9
Glyph @ 10 maps to String @ 10
Glyph @ 11 maps to String @ 11
Glyph @ 12 maps to String @ 12
Glyph @ 13 maps to String @ 13
Glyph @ 14 maps to String @ 14
Glyph @ 15 maps to String @ 15
joes-mac: Tue Apr 14, 10:26:00
~/Source/FontGlyph/./main /lig
String: This is Zapfino!
        StrLen: 16, Count Of Glyphs: 7
Glyph @ 0 maps to String @ 0
Glyph @ 1 maps to String @ 2
Glyph @ 2 maps to String @ 4
Glyph @ 3 maps to String @ 5
Glyph @ 4 maps to String @ 7
Glyph @ 5 maps to String @ 8
Glyph @ 6 maps to String @ 15

I added the Ligature option to help visualize that glyphs and characters can pretty easily not be 1 to 1. Here is a visual representation of the two strings: enter image description here

like image 136
Joe Avatar answered Nov 15 '22 11:11

Joe


I think you may end up having to parse the font’s mapping tables yourself. You can obtain access to the tables using CGFontCopyTableForTag(); the table you're after is the 'cmap' table, the format of which is documented here:

http://www.microsoft.com/typography/otspec/cmap.htm

and also here:

http://developer.apple.com/fonts/TTRefMan/RM06/Chap6cmap.html

Unfortunately, as you’ll discover by reading through these, the business of mapping characters to glyphs is decidedly non-trivial, and in addition any given font may have more than one mapping table (i.e. the set of characters that use a given glyph may depend on which mapping table format you—or the renderer—chooses).

Furthermore, advanced font technology like OpenType or AAT may result in the existence of glyphs for which there is no direct mapping from characters, but that are nevertheless present in the output as a result of substitutions made by the smart font technology. Inverting the OpenType or AAT substitution mechanisms would be tricky, and might also not lead to a single Unicode code point (or indeed even a single grapheme cluster).

like image 38
al45tair Avatar answered Nov 15 '22 09:11

al45tair