Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Add page as layer from separate pdf(different page size) using pdfbox

Tags:

java

pdf

pdfbox

How can I add a page from external pdf doc to destination pdf if pages have different sizes? Here is what I'd like to accomplish: enter image description here

I tried to use LayerUtility (like in this example PDFBox LayerUtility - Importing layers into existing PDF), but once I import page from external pdf the process hangs:

PDDocument destinationPdfDoc = PDDocument.load(fileInputStream);
PDDocument externalPdf = PDDocument.load(EXTERNAL PDF);

List<PDPage> destinationPages = destinationPdfDoc.getDocumentCatalog().getAllPages();

LayerUtility layerUtility = new LayerUtility(destinationPdfDoc);

// process hangs here
PDXObjectForm firstForm = layerUtility.importPageAsForm(externalPdf, 0);

AffineTransform affineTransform = new AffineTransform();
layerUtility.appendFormAsLayer(destinationPages.get(0), firstForm, affineTransform, "external page");


destinationPdfDoc.save(resultTempFile);

destinationPdfDoc.close();
externalPdf.close();

What I'm doing wrong?

like image 208
El Kopyto Avatar asked Feb 03 '15 09:02

El Kopyto


1 Answers

PDFBox dependencies

The main issue was that PDFBox has three core components and one required dependency. One core component was missing.

In comments the OP clarified that

Actually process doesn't hangs, the file is just not created at all.

As this sounds like there might have been an exception or error, trying to envelope the code as a try { ... } catch (Throwable t) { t.printStackTrace(); } block has been proposed in chat. And indeed,

java.lang.NoClassDefFoundError: org/apache/fontbox/util/BoundingBox 
    at org.apache.pdfbox.util.LayerUtility.importPageAsForm(LayerUtility.java:203) 
    at org.apache.pdfbox.util.LayerUtility.importPageAsForm(LayerUtility.java:135) 
    at ...

As it turned out, fontbox.jar was missing from the OP's setup.

The PDFBox version 1.8.x dependencies are described here. Especially there are the three core components pdfbox, fontbox, and jempbox all of which shall be present in the same version, and there is the required dependency commons-logging.

As soon as the missing component had been added, the sample worked properly.

Positioning the imported page

The imported page can be positioned on the target page by means of a translation in the AffineTransform parameter. This parameter also allows for other transformations, e.g. to scale, rotate, mirror, skew,...*

For the original sample files this PDF page

Source page from test-pdf.pdf

was added onto onto this page

enter image description here

which resulted in this page

result of the OP's original code

The OP then wondered

how to position the imported layer

The parameter for that in the layerUtility.appendFormAsLayer call is the AffineTransform affineTransform. The OP used new AffineTransform() here which creates an identity matrix which in turn causes the source page to be added at the origin of coordinate system, in this case at the bottom.

By using a translation instead of the identity, e.g

PDRectangle destCrop = destinationPages.get(0).findCropBox();
PDRectangle sourceBox = firstForm.getBBox();
AffineTransform affineTransform = AffineTransform.getTranslateInstance(0, destCrop.getUpperRightY() - sourceBox.getHeight());

one can position the source page elsewhere, e.g. at the top:

result using the translation above

PDFBox LayerUtility's expectations

Unfortunately it turns out that layerUtility.appendFormAsLayer appends the form to the page without resetting the graphics context.

layerUtility.appendFormAsLayer uses this code to add an additional content stream:

PDPageContentStream contentStream = new PDPageContentStream(
        targetDoc, targetPage, true, !DEBUG);

Unfortunately a content stream generated by this constructor inherits the graphics state as is at the end of the existing content of the target page. This especially means that the user space coordinate system may not be in its default state anymore. Some software e.g. mirrors the coordinate system to have y coordinates increasing downwards.

If instead

PDPageContentStream contentStream = new PDPageContentStream(
        targetDoc, targetPage, true, !DEBUG, true);

had been used, the graphics state would have been reset to its default state and, therefore, be known.

By itself, therefore, this method is not usable in a controlled manner for arbitrary input.

Fortunately, though, the LayerUtility also has a method wrapInSaveRestore(PDPage) to overcome this weakness by manipulating the content of the given page to have the default graphics state at the end.

Thus, one should replace

layerUtility.appendFormAsLayer(destinationPages.get(0), firstForm, affineTransform, "external page");

by

PDPage destPage = destinationPages.get(0);
layerUtility.wrapInSaveRestore(destPage);
layerUtility.appendFormAsLayer(destPage, firstForm, affineTransform, "external page");
like image 96
mkl Avatar answered Oct 23 '22 02:10

mkl