Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to check which character sets (codepages) font supports (has letters for)?

For my app I need to show a list of system fonts, but filter out all the fonts that do not support 20 predefined languages (the set is hardcoded) and show only those which do.

I can have a list of available fonts by calling Vcl.Forms.Screen.Fonts.
Knowing just the font name from that list, how do I check which character sets (codepages) this font supports (has actual letters for) ?

For example, common fonts like Arial or Times New Roman have characters for almost all European languages, including Cyrillic (and also Chinese and such). Yet many less common fonts often have only English letters.

The app is for internal use, so having a function that simply queries the font if it has a certain letter specific to some character set / codepage (e.g. Ф or Ў or ξ) and that it is not substituted with letter from another generic font (or some placeholder) would suffice.

like image 522
Kromster Avatar asked Jan 31 '16 09:01

Kromster


People also ask

What is an example of a character set?

Examples of character sets include International EBCDIC, Latin 1, and Unicode. Character sets are chosen on the basis of the letters and symbols required. Character sets are referred to by a name or by an integer identifier called the coded character set identifier (CCSID).

What is a character set in coding?

A coded character set is a set of characters for which a unique number has been assigned to each character. Units of a coded character set are known as code points . A code point value represents the position of a character in the coded character set.

What is character set in multimedia?

A character set refers to the composite number of different characters that are being used and supported by a computer software and hardware. It consists of codes, bit pattern or natural numbers used in defining some particular character.


2 Answers

The GetGlyphIndices function can be used to determine whether a glyph exists in a font.

Citing the MSDN docs:

DWORD GetGlyphIndices(
  _In_  HDC     hdc,
  _In_  LPCTSTR lpstr,
  _In_  int     c,
  _Out_ LPWORD  pgi,
  _In_  DWORD   fl
);

Parameters [...]

fl [in]: Specifies how glyphs should be handled if they are not supported. This parameter can be the following value.

GGI_MARK_NONEXISTING_GLYPHS -- Marks unsupported glyphs with the hexadecimal value 0xffff.

The Remarks sections links again to the Uniscribe functions, e.g. ScriptGetCMap

This function attempts to identify a single-glyph representation for each character in the string pointed to by lpstr. While this is useful for certain low-level purposes (such as manipulating font files), higher-level applications that wish to map a string to glyphs will typically wish to use the Uniscribe functions.

As both APIs are supported from Win2k onwards, it is probably a matter of taste which one to use.

(EDIT: Just noticed that the import is already in Windows.pas)

Sample code

procedure Test( dc : HDC);
var str : UnicodeString;
    buf : array of WORD;
    len,i : Integer;
    count : DWORD;
begin
  str := 'abc'+WideChar($0416)+'äöü';
  len := Length(str);
  SetLength( buf, len);
  count := GetGlyphIndicesW( dc, PWideChar(str), len, @buf[0], GGI_MARK_NONEXISTING_GLYPHS);
  if count > 0 then begin
    for i := 0 to count-1 do begin
      Write('index ',i,': ');
      if buf[i] = $FFFF
      then Writeln('glyph missing')
      else Writeln('ok');
    end;
  end;
end;

yields

index 0: ok
index 1: ok
index 2: ok
index 3: glyph missing
index 4: ok
index 5: ok
index 6: ok
like image 59
JensG Avatar answered Sep 21 '22 17:09

JensG


If you want to check entire character set support, you can use EnumFontFamiliesEx from the Windows API - this doesn't let you query a single font, but rather returns a list of installed fonts which support a given character set (or which have any other set of queryable features).

You'll need a callback function of the appropriate type :

function EnumFontCallback(lpelfe : PLogFont;
                          lpntme : PNewTextMetricEX;
                          FontType : DWORD;
                          lp : LPARAM) : integer; stdcall;
begin
  TMemo(lp).Lines.Add(lpelfe^.lfFaceName);
  result := 1;  // return zero to end enumeration
end;

And then call as :

procedure TForm1.Button1Click(Sender: TObject);
var
  lf : TLogFont;
begin
  ZeroMemory(@lf,SizeOf(TLogFont));

  lf.lfCharSet := CHINESEBIG5_CHARSET;

  if not EnumFontFamiliesEx(Canvas.Handle,      // HDC
                            lf,                 // TLogFont
                            @EnumFontCallback,  // Callback Pointer
                            NativeInt(Memo1),   // user supplied pointer
                            0) then             // must be zero
  begin
    // function call failed.
  end;
end;

With the various fields in the TLogFont (MSDN) structure you can query a wide variety of font features. In this case I have restricted only the character set (to Chinese Big-5 in the above example).

The callback will fire once for every resulting font returned from the query. You will need to manage collecting this information as it is returned. To add restrictions for several character sets you would need to call EnumFontFamiliesEx once for each character set of interest. The following constants are defined in the RTL Windows unit :

ANSI_CHARSET
BALTIC_CHARSET
CHINESEBIG5_CHARSET
DEFAULT_CHARSET      // depends on system locale
EASTEUROPE_CHARSET
GB2312_CHARSET
GREEK_CHARSET
HANGUL_CHARSET
MAC_CHARSET
OEM_CHARSET          // depends on OS
RUSSIAN_CHARSET
SHIFTJIS_CHARSET
SYMBOL_CHARSET
TURKISH_CHARSET
VIETNAMESE_CHARSET
JOHAB_CHARSET
ARABIC_CHARSET
HEBREW_CHARSET
THAI_CHARSET 

Cross-referencing would then be up to you - a TDictionary seems like a sensible tool to manage that task.

like image 29
J... Avatar answered Sep 23 '22 17:09

J...