Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to get width of a truetype font character in 1200ths of an inch with Python?

I can get the height and width of a character in pixels with PIL (see below), but (unless I'm mistaken) pixel size depends on the screen's DPI, which can vary. Instead what I'd like to do is calculate the width of a character in absolute units like inches, or 1200ths of an inch ("wordperfect units").

>>> # Getting pixels width with PIL
>>> font = ImageFont.truetype('/blah/Fonts/times.ttf' , 12)
>>> font.getsize('a')
(5, 14)

My reason for wanting to do this is to create a word-wrapping function for writing binary Word Perfect documents. Word Perfect requires soft linebreak codes to be inserted at valid points throughout the text, or the file will be corrupt and unopenable. The question is where to add them for variable width fonts.

I realize that I don't fully understand the relationship between pixels and screen resolution and font sizes. Am I going about this all wrong?

like image 902
twneale Avatar asked Nov 16 '10 02:11

twneale


1 Answers

Raw text widths are usually calculated in typographer's points, but since the point for the purpose of font definitions is defined as 1/72 of an inch, you can easily convert it into any other unit.

To get the design width of a character (expressed in em units), you need access to the low-level data of the font. The easiest way is to pip install fonttools, which has everything to work at the lowest possible level of font definitions.

With fontTools installed, you can:

  1. load the font data – this requires the path to the actual font file;

  2. character widths are stored as glyph widths, meaning you must retrieve a 'character-to-glyph' mapping; this is in the cmap table of a font:

    a. load the cmap for your font. The most useful is the Unicode map – a font may contain others. b. load the glyph set for your font. This is a list of names for the glyphs in that font.

  3. Then, for each Unicode character, first look up its name and then use the name to retrieve its width in design units.

  4. Don't forget that the 'design units' is based on the overall 'design width' of a font. This can be a standard value of 1000 (typical for Type 1 fonts), 2048 (typical for TrueType fonts), or any other value.

That leads to this function:

from fontTools.ttLib import TTFont
from fontTools.ttLib.tables._c_m_a_p import CmapSubtable

font = TTFont('/Library/Fonts/Arial.ttf')
cmap = font['cmap']
t = cmap.getcmap(3,1).cmap
s = font.getGlyphSet()
units_per_em = font['head'].unitsPerEm

def getTextWidth(text,pointSize):
    total = 0
    for c in text:
        if ord(c) in t and t[ord(c)] in s:
            total += s[t[ord(c)]].width
        else:
            total += s['.notdef'].width
    total = total*float(pointSize)/units_per_em;
    return total

text = 'This is a test'

width = getTextWidth(text,12)

print ('Text: "%s"' % text)
print ('Width in points: %f' % width)
print ('Width in inches: %f' % (width/72))
print ('Width in cm: %f' % (width*2.54/72))
print ('Width in WP Units: %f' % (width*1200/72))

The result is:

Text: "This is a test"
Width in points: 67.353516
Width in inches: 0.935465
Width in cm: 2.376082
Width in WP Units: 1122.558594

and is correct when comparing to what Adobe InDesign reports. (Note that per-character kerning is not applied here! That would require a lot more code.)

Characters that are not defined in the font are silently ignored and, as usually is done, the width for the .notdef glyph gets used. If you want this reported as an error, remove the if test in the function.

The cast to float in the function getTextWidth is so this works under both Python 2.7 and 3.5, but note that if you use Python 2.7 and larger value Unicode characters (not plain ASCII), you need to rewrite the function to correctly use UTF8 characters.

like image 198
Jongware Avatar answered Oct 23 '22 20:10

Jongware