Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Understanding loading of font in PDFBox 2.0

Tags:

pdfbox

I have finally succeeded in making PDFBox print my unicodes. But now, I would like to understand the solution that I have come up with. The code below works and prints a to the page.

Two things do not work:

  • changing PDType0Font.load(documentMock, systemResourceAsStream, true); to PDType0Font.load(documentMock, systemResourceAsStream, false);

  • changing final PDFont robotoLight = loadFontAlternative("Roboto-Light.ttf"); to final PDFont robotoLight = loadFont("Roboto-Light.ttf");

The first change prints two dots instead of the character. What does embedSubset do, since it does not work when set to false? The documentation is too sparse for me to understand.

The second change gives the following exception Exception in thread "main" java.lang.IllegalArgumentException: U+2265 is not available in this font's encoding: WinAnsiEncoding This problem has been covered in many other questions that pre-dates PDFBox 2.0 where there was a bug in handling unicodes. So, they do not answer the question directly. That aside, the problem is clear: I should not set the encoding to WinAnsiEncoding but something different. But what should the encoding be? and why is there no UTF-8 encoding or similar available? There is no documentation in COSName about the many options.

public class SimpleReportUnicode {
    public static void main(String[] args) throws IOException {
        PDDocument report = createReport();
        final String fileLocation = "c:/SimpleFormUnicode.pdf";
        report.save(fileLocation);
        report.close();
    }

    private static PDDocument createReport() throws IOException {
        PDDocument document = new PDDocument();
        PDPage page = new PDPage();
        document.addPage(page);

        PDPageContentStream contentStream = new PDPageContentStream(document, page);
        final PDFont robotoLight = loadFontAlternative("Roboto-Light.ttf");
        writeText(contentStream, robotoLight, 100, 650);

        contentStream.close();
        return document;
    }

    private static void writeText(PDPageContentStream contentStream, PDFont font, double x, double y) {
        try {
            contentStream.beginText();
            contentStream.setFont(font, 12);
            contentStream.moveTextPositionByAmount((float) x, (float) y);
            String unicode = "≥";
            contentStream.showText(unicode);
            contentStream.endText();
        }
        catch (IOException e) {
        }
    }

    private static PDFont loadFont(String location) {
        PDFont font;
        try {
            PDDocument documentMock = new PDDocument();
            InputStream systemResourceAsStream = ClassLoader.getSystemResourceAsStream(location);
            Encoding encoding = Encoding.getInstance(COSName.WIN_ANSI_ENCODING);
            font = PDTrueTypeFont.load(documentMock, systemResourceAsStream, encoding);
        }
        catch (IOException e) {
            throw new RuntimeException("IO exception");
        }
        return font;
    }

    private static PDFont loadFontAlternative(String location) {
        PDDocument documentMock = new PDDocument();
        InputStream systemResourceAsStream = ClassLoader.getSystemResourceAsStream(location);
        PDFont font;
        try {
            font = PDType0Font.load(documentMock, systemResourceAsStream, true);
        }
        catch (IOException e) {
            throw new RuntimeException("IO exception");
        }
        return font;
    }
}

EDIT If you want to use the same font as in the code, Roboto is available here: https://fonts.google.com/specimen/Roboto Add Roboto-Light.ttf to your classpath and the code should work out of the box.

like image 924
Little Helper Avatar asked Nov 03 '17 09:11

Little Helper


1 Answers

As discussed in the comments:

  • The problem with embedSubsets went away by using version 2.0.7. (Btw 2.0.8 was released today);
  • The problem "U+2265 is not available in this font's encoding: WinAnsiEncoding" is explained in the FAQ and the solution is to use PDType0Font.load() which you already did in your working version;
  • There is no UTF-8 encoding for fonts because it isn't available in the PDF specification;
  • using embedSubsets true produces a 4KB file, with false the file is 100KB because the full font is embedded, so false is usually best.
like image 163
Tilman Hausherr Avatar answered Sep 24 '22 09:09

Tilman Hausherr