Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can one find the Unicode codepoints that a font has glyphs for, on a Debian-based system?

From a scripting language (Python or Ruby, say) on a Debian-based system, I would like to find either one of:

  1. All the Unicode codepoints that a particular font has glyphs for
  2. All the fonts that have glyphs for a particular Unicode codepoint

(Obviously either 1 or 2 can be derived form the other, so whatever is easier would be great.) I have done this in the past by running:

fc-list : file charset

... and parsing the output at the end of each line, based on this code from fontconfig but it seems to me that there ought to be a much simpler way of doing this.

(I'm not completely sure this is the right StackExchange site for this question, but I am looking for an answer that can be used programmatically.)

like image 791
Mark Longair Avatar asked Apr 09 '13 08:04

Mark Longair


2 Answers

I would try any of the FreeType 2 language bindings. Here's a Perl solution to list the Unicode code points of a font using Font::FreeType:

use Font::FreeType;
Font::FreeType->new->face('DejaVuSans.ttf')->foreach_char(sub {
    printf("%04X\n", $_->char_code);
});
like image 91
nwellnhof Avatar answered Nov 17 '22 23:11

nwellnhof


I've recently listed the mapping from unicode codepoints to glypths in a TTF using TTX/FontTools. That tool is written in Python, so it matches the Python tag in your post. The command

ttx -t cmap foo.ttf

will generate an XML file foo.ttx which describes that mapping, for various environments and encodings. See e.g. this reference for a description of what the platform and encoding identifiers actually mean. I assume that the package can be used as a library as well as a command line tool, but I have no experience there.

like image 25
MvG Avatar answered Nov 17 '22 21:11

MvG