I'm having problems with pdfbox 2.0.2 writing a pdf document from elements of a previously read document (https://www.dropbox.com/s/ttxiv0dq3abh5kj/Test.pdf?dl=0). Everything works fine, except when I call showText
on a PDPageContentStream where I previously set the font with out.setFont(textState.getFont(), textState.getFontSize())
(see the INFORMATION log) and the font is ComicSansMS or ArialBlack. textState
is (a clone from) the state from the previously read document. Writing text with Helvetica or Times-Roman works fine.
INFORMATION: set font PDTrueTypeFont RXNQOL+ComicSansMS,Bold/18.0 embedded
SEVERE: error writing <w>U+0077 is not available in this font's encoding: built-in (TTF)
I suppose the problem may be caused by a missing hyphen or blank in the font name but have no clue how to fix this.
Here is the complete code
import java.awt.Point;
import java.awt.geom.Point2D;
import java.io.File;
import java.io.IOException;
import org.apache.pdfbox.contentstream.PDFGraphicsStreamEngine;
import org.apache.pdfbox.cos.COSName;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDPage;
import org.apache.pdfbox.pdmodel.PDPageContentStream;
import org.apache.pdfbox.pdmodel.font.PDFont;
import org.apache.pdfbox.pdmodel.graphics.image.PDImage;
import org.apache.pdfbox.pdmodel.graphics.state.PDTextState;
import org.apache.pdfbox.util.Matrix;
import org.apache.pdfbox.util.Vector;
public class Test extends PDFGraphicsStreamEngine {
public static void main(String[] args) throws IOException {
test();
}
public static void test() throws IOException {
PDDocument document = PDDocument.load(new File("Test.pdf"));
PDPage pageIn = document.getPage(0);
PDDocument saveDoc = new PDDocument();
PDPage savePage = new PDPage(pageIn.getMediaBox());
saveDoc.addPage(savePage);
try (PDPageContentStream out = new PDPageContentStream(saveDoc, savePage)) {
Test test = new Test(pageIn, out);
test.processPage(pageIn);
}
}
private final PDPageContentStream out;
public Test(PDPage pageIn, PDPageContentStream out) {
super(pageIn);
this.out = out;
}
@Override
public void appendRectangle(Point2D p0, Point2D p1, Point2D p2, Point2D p3) throws IOException {
}
@Override
public void clip(int windingRule) throws IOException {
}
@Override
public void closePath() throws IOException {
}
@Override
public void curveTo(float x1, float y1, float x2, float y2, float x3, float y3) throws IOException {
}
@Override
public void drawImage(PDImage pdImage) throws IOException {
}
@Override
public void endPath() throws IOException {
}
@Override
public void fillAndStrokePath(int windingRule) throws IOException {
}
@Override
public void fillPath(int windingRule) throws IOException {
}
@Override
public Point2D getCurrentPoint() {
return new Point(0, 0);
}
@Override
public void lineTo(float x, float y) throws IOException {
}
@Override
public void moveTo(float x, float y) throws IOException {
}
@Override
public void shadingFill(COSName shadingName) throws IOException {
}
@Override
protected void showGlyph(Matrix textRenderingMatrix, PDFont font, int code, String unicode, Vector displacement) throws IOException {
super.showGlyph(textRenderingMatrix, font, code, unicode, displacement);
PDTextState textState = getGraphicsState().getTextState();
out.beginText();
out.setTextMatrix(getTextMatrix());
out.setFont(textState.getFont(), textState.getFontSize());
out.showText(unicode);
out.endText();
}
@Override
public void strokePath() throws IOException {
}
}
Any suggestions?
Thanks, Juergen
Is PDFBox thread safe? No! Only one thread may access a single document at a time. You can have multiple threads each accessing their own PDDocument object.
Please change the code so that it is complete, i.e. simulate your database input with some array for the drawTable() call. Also mention what PDFBox version you are using. javadoc of newLineAtOffset: "Move to the start of the next line, offset from the start of the current line by (tx, ty).".
public class PDDocument extends Object implements Closeable. This is the in-memory representation of the PDF document. The #close() method must be called once the document is no longer needed.
tl;dr: That font doesn't support encoding.
The cause of the problem is that your Comic Sans subsetted font does have a "post" (postscript) table, but that its glyphNames table is null. I.e. your font does not have glyph names. For A-Z, a-z the names are like these characters; for "(" the glyph name is "parenleft". Because these names are missing, PDFBox creates pseudo names from the glyph ID like "90" (instead of "w") for "w" in the second part of PDTrueType.readEncodingFromFont().
However when encoding, PDFBox uses the Adobe Glyphlist, as the font does not have an encoding entry. If you look with PDFDebugger at the other fonts, e.g. R18, you'll find "Encoding: WinAnsiEncoding":
What you are apparently doing is to create a new page with text only. A different way to do this is to analyse the content streams and simply remove all tokens that paint stuff different than text. To start with that, have a look at the RemoveAllText example in the source code download, and download the PDF 32000 specification, and look at the part "operators summary" and be careful what you delete. For example "Do" is used both to draw images and to draw XObject forms, which are also content streams.
See here: How can I remove all images/drawings from a PDF file and leave text only in Java?
Both solutions are wrong, the first one just pulls all images from under the feet, the second one is a good start but does not take care to check whether the parameter is an image or not.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With