pdfbox: ... is not available in this font's encoding

Tags:

I'm having problems with pdfbox 2.0.2 writing a pdf document from elements of a previously read document (https://www.dropbox.com/s/ttxiv0dq3abh5kj/Test.pdf?dl=0). Everything works fine, except when I call showText on a PDPageContentStream where I previously set the font with out.setFont(textState.getFont(), textState.getFontSize()) (see the INFORMATION log) and the font is ComicSansMS or ArialBlack. textState is (a clone from) the state from the previously read document. Writing text with Helvetica or Times-Roman works fine.

INFORMATION: set font PDTrueTypeFont RXNQOL+ComicSansMS,Bold/18.0 embedded    
SEVERE: error writing <w>U+0077 is not available in this font's encoding: built-in (TTF)

I suppose the problem may be caused by a missing hyphen or blank in the font name but have no clue how to fix this.

Here is the complete code

import java.awt.Point;
import java.awt.geom.Point2D;
import java.io.File;
import java.io.IOException;
import org.apache.pdfbox.contentstream.PDFGraphicsStreamEngine;
import org.apache.pdfbox.cos.COSName;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDPage;
import org.apache.pdfbox.pdmodel.PDPageContentStream;
import org.apache.pdfbox.pdmodel.font.PDFont;
import org.apache.pdfbox.pdmodel.graphics.image.PDImage;
import org.apache.pdfbox.pdmodel.graphics.state.PDTextState;
import org.apache.pdfbox.util.Matrix;
import org.apache.pdfbox.util.Vector;

public class Test extends PDFGraphicsStreamEngine {

public static void main(String[] args) throws IOException {
    test();
}

public static void test() throws IOException {
    PDDocument document = PDDocument.load(new File("Test.pdf"));
    PDPage pageIn = document.getPage(0);
    PDDocument saveDoc = new PDDocument();
    PDPage savePage = new PDPage(pageIn.getMediaBox());
    saveDoc.addPage(savePage);
    try (PDPageContentStream out = new PDPageContentStream(saveDoc, savePage)) {
        Test test = new Test(pageIn, out);
        test.processPage(pageIn);
    }
}

private final PDPageContentStream out;

public Test(PDPage pageIn, PDPageContentStream out) {
    super(pageIn);
    this.out = out;
}

@Override
public void appendRectangle(Point2D p0, Point2D p1, Point2D p2, Point2D p3) throws IOException {
}

@Override
public void clip(int windingRule) throws IOException {
}

@Override
public void closePath() throws IOException {
}

@Override
public void curveTo(float x1, float y1, float x2, float y2, float x3, float y3) throws IOException {
}

@Override
public void drawImage(PDImage pdImage) throws IOException {
}

@Override
public void endPath() throws IOException {
}

@Override
public void fillAndStrokePath(int windingRule) throws IOException {
}

@Override
public void fillPath(int windingRule) throws IOException {
}

@Override
public Point2D getCurrentPoint() {
    return new Point(0, 0);
}

@Override
public void lineTo(float x, float y) throws IOException {
}

@Override
public void moveTo(float x, float y) throws IOException {
}

@Override
public void shadingFill(COSName shadingName) throws IOException {
}

@Override
protected void showGlyph(Matrix textRenderingMatrix, PDFont font, int code, String unicode, Vector displacement) throws IOException {
    super.showGlyph(textRenderingMatrix, font, code, unicode, displacement);
    PDTextState textState = getGraphicsState().getTextState();
    out.beginText();
    out.setTextMatrix(getTextMatrix());
    out.setFont(textState.getFont(), textState.getFontSize());
    out.showText(unicode);
    out.endText();
}

@Override
public void strokePath() throws IOException {
}

}

Any suggestions?

Thanks, Juergen

998

asked Aug 05 '16 10:08

Juergen

1 Answers

tl;dr: That font doesn't support encoding.

The cause of the problem is that your Comic Sans subsetted font does have a "post" (postscript) table, but that its glyphNames table is null. I.e. your font does not have glyph names. For A-Z, a-z the names are like these characters; for "(" the glyph name is "parenleft". Because these names are missing, PDFBox creates pseudo names from the glyph ID like "90" (instead of "w") for "w" in the second part of PDTrueType.readEncodingFromFont().

enter image description here

However when encoding, PDFBox uses the Adobe Glyphlist, as the font does not have an encoding entry. If you look with PDFDebugger at the other fonts, e.g. R18, you'll find "Encoding: WinAnsiEncoding":

enter image description here

What you are apparently doing is to create a new page with text only. A different way to do this is to analyse the content streams and simply remove all tokens that paint stuff different than text. To start with that, have a look at the RemoveAllText example in the source code download, and download the PDF 32000 specification, and look at the part "operators summary" and be careful what you delete. For example "Do" is used both to draw images and to draw XObject forms, which are also content streams.

See here: How can I remove all images/drawings from a PDF file and leave text only in Java?

Both solutions are wrong, the first one just pulls all images from under the feet, the second one is a good start but does not take care to check whether the parameter is an image or not.

answered Sep 25 '22 16:09

Tilman Hausherr

Related questions
                            
                                LESS & Bootstrap : error evaluating function `darken`: color.toHSL is not a function in file
                            
                                Why does Laravel, by default, logout via POST (As opposed to GET)? [duplicate]
                            
                                Fetch data with React
                            
                                Swift + Locksmith: Not storing value
                            
                                "QPainter::drawRects: Painter not active " error C++/QT
                            
                                Auto scroll recyclerview slider
                            
                                tensorflow shape of a tiled tensor
                            
                                Movie is not a constructor - Mongoose
                            
                                Determining part size and queue size parameters for AWS S3 upload
                            
                                HowTo import core-js Map into angular-cli webpack generated application
                            
                                Performance profiling a KEXT
                            
                                Select Date From DateTime In Query - Yii2

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With