Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Accessing font files within PDF

We are currently working with a selection of publishers to generate online books from their PDF's. Our legacy app uses flex, so for this we are converting the PDF to SWF files using PDF2SWF by SWFTools.

The problem that we are having is that the text within the SWF document is not being highlighted by our flex reader when the user performs a search. After a quick investigation we found that when extracting text we need to embed the fonts that are used by the PDF document:

http://wiki.swftools.org/wiki/How_do_I_highlight_text_in_the_SWF%3F

pdf2swf -F $YOUR_FONTS_DIR$ -f input.pdf -o output.swf

As you can see from the code above, we need a path to a font directory containig the fonts found within that PDF.

Since we will be converting a large number of PDF's, is it possible to access the font files directly through the PDF rather than having a lot of fonts stored within our app?

Additional Information

Our app is written in Java.

We are currently using PDFBox and Ghostscript within the app, so if any solutions use these libraries than that would be a preferred option, but we are open to all ideas.

like image 245
My Head Hurts Avatar asked Jan 06 '12 13:01

My Head Hurts


People also ask

How do I find embedded fonts in PDF?

Once you have opened the correct PDF in Acrobat, go to File > Document Properties. 3. Select the tab at the top that says Fonts and then look for the fonts that have (Embedded Subset) at the end of their name. These are the fonts that are already embedded.

How do I get all the fonts in Adobe Acrobat?

Select the fonts icon in the upper right. Select Add fonts to Creative Cloud in the left sidebar. Select fonts from your desktop to add them, or simply drag them to the space provided. (If you've already used this feature, select Add more to add more fonts.)

Are fonts automatically embedded in PDF?

Once you have converted your file to a PDF, you will want to check whether or not all of the fonts are embedded in your PDF file. All fonts should show as “Embedded Subset.”


1 Answers

PDF files don't contain font 'files' they may not even contain any fonts at all, though this is rare. The embedded font data can be in a bewildering variety of formats:

  • type 1 PostScript fonts
  • type 3 PostScript
  • fonts TrueType fonts
  • PostScript CFF fonts
  • CIDFonts with type 1 PostScript outlines
  • CIDFonts with type 3 PostScript outlines
  • CIDFonts with TrueType outlines
  • CIDFonts with CFF outlines
  • CIDFonts with bitmap images

Will your application be able to read all these font formats ? If you want to use them then you must use the fonts embedded in the PDF file as these will very often be subset fonts, and supplied with a custom Encoding, which means that even if you have the original font, you can't use it because the Encoding will not be correct.

Of course it may be that these PDF files are all created in a consistent way and do not use embedded fonts, but I have my doubts....

like image 125
KenS Avatar answered Oct 06 '22 09:10

KenS