Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pdfbox: ... is not available in this font's encoding

Tags:

I'm having problems with pdfbox 2.0.2 writing a pdf document from elements of a previously read document (https://www.dropbox.com/s/ttxiv0dq3abh5kj/Test.pdf?dl=0). Everything works fine, except when I call showText on a PDPageContentStream where I previously set the font with out.setFont(textState.getFont(), textState.getFontSize()) (see the INFORMATION log) and the font is ComicSansMS or ArialBlack. textState is (a clone from) the state from the previously read document. Writing text with Helvetica or Times-Roman works fine.

INFORMATION: set font PDTrueTypeFont RXNQOL+ComicSansMS,Bold/18.0 embedded    
SEVERE: error writing <w>U+0077 is not available in this font's encoding: built-in (TTF)

I suppose the problem may be caused by a missing hyphen or blank in the font name but have no clue how to fix this.

Here is the complete code

import java.awt.Point;
import java.awt.geom.Point2D;
import java.io.File;
import java.io.IOException;
import org.apache.pdfbox.contentstream.PDFGraphicsStreamEngine;
import org.apache.pdfbox.cos.COSName;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDPage;
import org.apache.pdfbox.pdmodel.PDPageContentStream;
import org.apache.pdfbox.pdmodel.font.PDFont;
import org.apache.pdfbox.pdmodel.graphics.image.PDImage;
import org.apache.pdfbox.pdmodel.graphics.state.PDTextState;
import org.apache.pdfbox.util.Matrix;
import org.apache.pdfbox.util.Vector;

public class Test extends PDFGraphicsStreamEngine {

public static void main(String[] args) throws IOException {
    test();
}

public static void test() throws IOException {
    PDDocument document = PDDocument.load(new File("Test.pdf"));
    PDPage pageIn = document.getPage(0);
    PDDocument saveDoc = new PDDocument();
    PDPage savePage = new PDPage(pageIn.getMediaBox());
    saveDoc.addPage(savePage);
    try (PDPageContentStream out = new PDPageContentStream(saveDoc, savePage)) {
        Test test = new Test(pageIn, out);
        test.processPage(pageIn);
    }
}

private final PDPageContentStream out;

public Test(PDPage pageIn, PDPageContentStream out) {
    super(pageIn);
    this.out = out;
}

@Override
public void appendRectangle(Point2D p0, Point2D p1, Point2D p2, Point2D p3) throws IOException {
}

@Override
public void clip(int windingRule) throws IOException {
}

@Override
public void closePath() throws IOException {
}

@Override
public void curveTo(float x1, float y1, float x2, float y2, float x3, float y3) throws IOException {
}

@Override
public void drawImage(PDImage pdImage) throws IOException {
}

@Override
public void endPath() throws IOException {
}

@Override
public void fillAndStrokePath(int windingRule) throws IOException {
}

@Override
public void fillPath(int windingRule) throws IOException {
}

@Override
public Point2D getCurrentPoint() {
    return new Point(0, 0);
}

@Override
public void lineTo(float x, float y) throws IOException {
}

@Override
public void moveTo(float x, float y) throws IOException {
}

@Override
public void shadingFill(COSName shadingName) throws IOException {
}

@Override
protected void showGlyph(Matrix textRenderingMatrix, PDFont font, int code, String unicode, Vector displacement) throws IOException {
    super.showGlyph(textRenderingMatrix, font, code, unicode, displacement);
    PDTextState textState = getGraphicsState().getTextState();
    out.beginText();
    out.setTextMatrix(getTextMatrix());
    out.setFont(textState.getFont(), textState.getFontSize());
    out.showText(unicode);
    out.endText();
}

@Override
public void strokePath() throws IOException {
}

}

Any suggestions?

Thanks, Juergen

like image 998
Juergen Avatar asked Aug 05 '16 10:08

Juergen


People also ask

Is PDFBox thread safe?

Is PDFBox thread safe? No! Only one thread may access a single document at a time. You can have multiple threads each accessing their own PDDocument object.

How do I create a dynamic table in PDFBox?

Please change the code so that it is complete, i.e. simulate your database input with some array for the drawTable() call. Also mention what PDFBox version you are using. javadoc of newLineAtOffset: "Move to the start of the next line, offset from the start of the current line by (tx, ty).".

What is PDDocument in Java?

public class PDDocument extends Object implements Closeable. This is the in-memory representation of the PDF document. The #close() method must be called once the document is no longer needed.


1 Answers

tl;dr: That font doesn't support encoding.

The cause of the problem is that your Comic Sans subsetted font does have a "post" (postscript) table, but that its glyphNames table is null. I.e. your font does not have glyph names. For A-Z, a-z the names are like these characters; for "(" the glyph name is "parenleft". Because these names are missing, PDFBox creates pseudo names from the glyph ID like "90" (instead of "w") for "w" in the second part of PDTrueType.readEncodingFromFont().

enter image description here

However when encoding, PDFBox uses the Adobe Glyphlist, as the font does not have an encoding entry. If you look with PDFDebugger at the other fonts, e.g. R18, you'll find "Encoding: WinAnsiEncoding":

enter image description here

What you are apparently doing is to create a new page with text only. A different way to do this is to analyse the content streams and simply remove all tokens that paint stuff different than text. To start with that, have a look at the RemoveAllText example in the source code download, and download the PDF 32000 specification, and look at the part "operators summary" and be careful what you delete. For example "Do" is used both to draw images and to draw XObject forms, which are also content streams.

See here: How can I remove all images/drawings from a PDF file and leave text only in Java?

Both solutions are wrong, the first one just pulls all images from under the feet, the second one is a good start but does not take care to check whether the parameter is an image or not.

like image 84
Tilman Hausherr Avatar answered Sep 25 '22 16:09

Tilman Hausherr