Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using pdfkit in node.js to render text in any language

I am using pdfkit to dynamically generate PDF documents within a nodewebkit application. The PDFs contain people's comments coming from a remote source via an HTTP request.

It works really well, however now I spotted that when a comment is in Japanese, Chinese, Arabic, etc. it doesn't render correctly, and I have no means of knowing what language the comments will be coming in—in fact I am gathering them from around the world.

I understood that I need to use the right font that should have proper characters included, as explained here. I spotted this "google noto" open font which has it all, but the problem is that there is no single TTF file with all languages, and there can't be as font files are limited to 65K glyphs.

I am trying to find a solution that lets render text in (almost) any language within a PDF using pdfkit, without having to write a sophisticated language recognition tool, which I feel would be an overkill.

Any thoughts and suggestions will be much appreciated.

UPDATE: Use font-manager by the author of pdfkit to substitute the font. Also you may want to try phantomJS—I haven't done that though. See detailed response by @levi in the comments if you have the same problem. Hope it helps.

like image 672
Alexey Stoletny Avatar asked Nov 09 '22 21:11

Alexey Stoletny


1 Answers

Here is one idea. Download all the fonts for the most popular languages. Add them to a list, and sort it by most popular. Foreach comment, get the unicode values for n random character's within the string. Foreach character, if code > 127 (ASCII range) comment may not be English. Using opentype.js, parse the font files one by one, foreach font, check the cmap table if there exists glyph's for all the character codes sampled. If there does, then choose that font, and cache a mapping between unicode code to font. Otherwise, try next font.

Upon further consideration, it seems TTF files provide info on the unicode ranges they support via the UnicodeRange field. So perhaps you could build a mapping between each font and the unicode ranges it supports, and use this to select the correct font, instead of parsing each font at run-time.

like image 133
levi Avatar answered Nov 14 '22 22:11

levi