Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Delete all unused characters from a TTF-font with Fontforge

How can I delete all characters from a TTF font file that are not used in a given text sample with Fontforge? In other words I want to create a subset from an existing font which contains only the characters that actually show in my text. (If you know a free tool other than fontforge that can do it, that works too for me).

Here's a small example: There is a text file that contains the words "사슴 코끼리 당나귀". So how can I delete all the other characters from the font file that are not part of that small text sample? In the end I want to end up with a new TTF file that contains only the used characters. The actual text is longer than this, so searching manually in fontforge is not an option.

In fontforge there are certain selection options (Main Menu > Edit > Selection), and I tried "Select by Wildcarcds" after converting the korean characters into their unicode sequences. But no luck yet.

Thanks a lot for any ideas! kind regards

EDIT: use case: I am creating children's ebooks which by their nature consist mainly of images. The text, however, is not part of the pictures but is displayed on an additional layer that is displayed in foreground of the pictures . The ebook files (I am producing mainly for amazon kindle) consist of some meta data, the image files, layout information and of course the font files. The Amazon Kindle publishing program has very strict file size restrictions. In order for a book to be sold for the certain price range that I'm going for, the file size must not exceed 3 megabytes. That is ok when I use a western font set. But my ebooks are bilingual and for the Korean edition I need to add a Korean font (in addition to the western font). Asian font files are comparatively huge due to the nature of their alphabets / glyphs. storing 20.000 (in extreme cases up to 200.000) glyphs makes for ~ 7-12 megabytes per font weight. Again, my overall book filesize limit is 3 megabytes which has to do for all the pictures and the font files (plus the layout and meta files). Knowing that the text of an ebook is not altered by the reader it is safe to discard all the glyphs from the font that are not used in my text. Not filling up the storage of the user's reading device unneccessarily is another consideration here. I already compressed the image files heavily and cannot go any further with compression as the quality starts suffering at certain compression rates. I hope now it's clear why I think subsetting the font is a good solution.

like image 903
oystersauce Avatar asked Feb 22 '16 14:02

oystersauce


2 Answers

I have found a way to create a subset of an existing font in FontForge on a semi-automated basis. The key was to use Fontforge's scripting capabilities. I used an internet service (see link below) to get the unicodes of all the characters that I use in my book. This is looking like this "\uc6d0\uc22d\uc774\uac1c\ubbf8\uacf0\ubc8c\ub3cc\uace0\ub798"

I took the output of the service and used Notepad++'s "search and replace" functionality to get the following structure for a script:

SelectMore("uc6d0")
SelectMore("uc22d")
SelectMore("uc774")
SelectMore("uac1c")
SelectMore("ubbf8")
SelectMore("uacf0")
SelectMore("ubc8c")
SelectMore("ub3cc")
SelectMore("uace0")
SelectMore("ub798")

It's just repeatedly calling the same function: SelectMore(). This function selects the glyph that gets passed as the argument without clearing any previous selection. Note also, that this script assumes that fontforge is running and has the font file opened. (Link to fontforge scripting help, see below). To execute a script right from within Fontforge select "file -> execute script... " from the main menu, paste the script and hit run.

Now all the used glyphs are selected, all unused glyphs are deselected. In the main menu hit "Edit -> Select -> Invert Selection" to have all the unused glyphs selected. now we can run in main menu "Encoding -> Detach and Remove glyphs" to remove all the selected (unused) glyphs. Now saving the font as a new font results in the required subset.

  • Unicode converter: https://www.branah.com/unicode-converter
  • FontForge: https://fontforge.org
  • FontForge scripting help: https://fontforge.github.io/scripting-alpha.html
like image 78
oystersauce Avatar answered Nov 13 '22 13:11

oystersauce


Great question/answer user3725694. To make script generation more automatic the following python code may be used to get the Unicode data for English printable characters:

import string 
s = string.printable
for c in s: print('SelectMore("u%04x")' % ord(c))

It returns:

SelectMore("u0030")

SelectMore("u0031")

...

There's a great article on automatic alternatives to manual FontForge activities

like image 21
rok Avatar answered Nov 13 '22 12:11

rok