Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

PDFBox hasGlyph() returns true for unsupported unicode control characters

I'm using Apache's PDFBox library to write a PdfDocumentBuilder class. I'm using currentFont.hasGlyph(character) to check if a character has a glyph before attempting to write it to the file. The problem is that when the character is a unicode control character like '\u001f', hasGlyph() returns true, causing an exception to be thrown by encode() when writing (see PdfDocumentBuilder code and stack trace below for reference).

I did some research and it appears these unicode control characters are not supported for the font I'm using (Courier Prime).

So why does hasGlyph() return true for unicode control characters when they are not supported? Of course I could strip the control characters from the line with a simple replaceAll before I enter the writeTextWithSymbol() method, but if the hasGlyph() method isn't working as I expect it to, I have a bigger problem.

PdfDocumentBuilder:

private final PDType0Font baseFont;
private PDType0Font currentFont;   

public PdfDocumentBuilder () {
    baseFont = PDType0Font.load(doc, this.getClass().getResourceAsStream("/CourierPrime.ttf"));
    currentFont = baseFont;
}

private void writeTextWithSymbol (String text) throws IOException {
    StringBuilder nonSymbolBuffer = new StringBuilder();
    for (char character : text.toCharArray()) {
        if (currentFont.hasGlyph(character)) {
            nonSymbolBuffer.append(character);
        } else {
            //handling writing line with symbols...
        }
    }
    if (nonSymbolBuffer.length() > 0) {
        content.showText(nonSymbolBuffer.toString());
    }
}

Stack trace:

java.lang.IllegalArgumentException: No glyph for U+001F in font CourierPrime
at org.apache.pdfbox.pdmodel.font.PDCIDFontType2.encode(PDCIDFontType2.java:400)
at org.apache.pdfbox.pdmodel.font.PDType0Font.encode(PDType0Font.java:351)
at org.apache.pdfbox.pdmodel.font.PDFont.encode(PDFont.java:316)
at org.apache.pdfbox.pdmodel.PDPageContentStream.showText(PDPageContentStream.java:414)
at org.main.export.PdfDocumentBuilder.writeTextWithSymbol(PdfDocumentBuilder.java:193)
like image 631
Kate Barnett Avatar asked Mar 03 '17 16:03

Kate Barnett


1 Answers

As explained in the comments above, hasGlyph() is not meant to accept unicode characters as a parameter. So if you need to check whether a character can be encoded before writing it, you can do something like this:

private void writeTextWithSymbol (String text) throws IOException {
    StringBuilder nonSymbolBuffer = new StringBuilder();
    for (char character : text.toCharArray()) {
        if (isCharacterEncodeable(character)) {
            nonSymbolBuffer.append(character);
        } else {
            //handle writing line with symbols...
        }
    }
    if (nonSymbolBuffer.length() > 0) {
        content.showText(nonSymbolBuffer.toString());
    }
}

private boolean isCharacterEncodeable (char character) throws IOException {
    try {
        currentFont.encode(Character.toString(character));
        return true;
    } catch (IllegalArgumentException iae) {
        LOGGER.trace("Character cannot be encoded", iae);
        return false;
    }
}
like image 180
Kate Barnett Avatar answered Oct 18 '22 11:10

Kate Barnett