Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

PDFBox: How to "flatten" a PDF-form?

How do I "flatten" a PDF-form (remove the form-field but keep the text of the field) with PDFBox?

Same question was answered here:

a quick way to do this, is to remove the fields from the acrofrom.

For this you just need to get the document catalog, then the acroform and then remove all fields from this acroform.

The graphical representation is linked with the annotation and stay in the document.

So I wrote this code:

import java.io.File;
import java.util.ArrayList;
import java.util.List;

import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDDocumentCatalog;
import org.apache.pdfbox.pdmodel.interactive.form.PDAcroForm;
import org.apache.pdfbox.pdmodel.interactive.form.PDField;

public class PdfBoxTest {
    public void test() throws Exception {
        PDDocument pdDoc = PDDocument.load(new File("E:\\Form-Test.pdf"));
        PDDocumentCatalog pdCatalog = pdDoc.getDocumentCatalog();
        PDAcroForm acroForm = pdCatalog.getAcroForm();

        if (acroForm == null) {
            System.out.println("No form-field --> stop");
            return;
        }

        @SuppressWarnings("unchecked")
        List<PDField> fields = acroForm.getFields();

        // set the text in the form-field <-- does work
        for (PDField field : fields) {
            if (field.getFullyQualifiedName().equals("formfield1")) {
                field.setValue("Test-String");
            }
        }

        // remove form-field but keep text ???
        // acroForm.getFields().clear();         <-- does not work
        // acroForm.setFields(null);             <-- does not work
        // acroForm.setFields(new ArrayList());  <-- does not work
        // ???

        pdDoc.save("E:\\Form-Test-Result.pdf");
        pdDoc.close();
    }
}
like image 888
Lukas Avatar asked Jan 22 '13 08:01

Lukas


People also ask

Is PDFBox free to use?

Bookmark this question. Show activity on this post. PDFbox is that PDFbox is the free version.

Is PDFBox thread safe?

Is PDFBox thread safe? No! Only one thread may access a single document at a time. You can have multiple threads each accessing their own PDDocument object.


2 Answers

With PDFBox 2 it's now possible to "flatten" a PDF-form easily by calling the flatten method on a PDAcroForm object. See Javadoc: PDAcroForm.flatten().

Simplified code with an example call of this method:

//Load the document
PDDocument pDDocument = PDDocument.load(new File("E:\\Form-Test.pdf"));    
PDAcroForm pDAcroForm = pDDocument.getDocumentCatalog().getAcroForm();

//Fill the document
...

//Flatten the document
pDAcroForm.flatten();

//Save the document
pDDocument.save("E:\\Form-Test-Result.pdf");
pDDocument.close();

Note: dynamic XFA forms cannot be flatten.

For migration from PDFBox 1.* to 2.0, take a look at the official migration guide.

like image 51
Sylvain Bugat Avatar answered Sep 20 '22 12:09

Sylvain Bugat


This works for sure - I've ran into this problem, debugged all-night, but finally figured out how to do this :)

This is assuming that you have capability to edit the PDF in some way/have some control over the PDF.

First, edit the forms using Acrobat Pro. Make them hidden and read-only.

Then you need to use two libraries: PDFBox and PDFClown.

PDFBox removes the thing that tells Adobe Reader that it's a form; PDFClown removes the actual field. PDFClown must be done first, then PDFBox (in that order. The other way around doesn't work).

Single field example code:

// PDF Clown code
File file = new File("Some file path"); 
Document document = file.getDocument();
Form form = file.getDocument.getForm();
Fields fields = form.getFields();
Field field = fields.get("some_field_name");

PageStamper stamper = new PageStamper(); 
FieldWidgets widgets = field.getWidgets();
Widget widget = widgets.get(0); // Generally is 0.. experiment to figure out
stamper.setPage(widget.getPage());

// Write text using text form field position as pivot.
PrimitiveComposer composer = stamper.getForeground();
Font font = font.get(document, "some_path"); 
composer.setFont(font, 10); 
double xCoordinate = widget.getBox().getX();
double yCoordinate = widget.getBox().getY(); 
composer.showText("text i want to display", new Point2D.Double(xCoordinate, yCoordinate)); 

// Actually delete the form field!
field.delete();
stamper.flush(); 

// Create new buffer to output to... 
Buffer buffer = new Buffer();
file.save(buffer, SerializationModeEnum.Standard); 
byte[] bytes = buffer.toByteArray(); 

// PDFBox code
InputStream pdfInput = new ByteArrayInputStream(bytes);
PDDocument pdfDocument = PDDocument.load(pdfInput);

// Tell Adobe we don't have forms anymore.
PDDocumentCatalog pdCatalog = pdfDocument.getDocumentCatalog();
PDAcroForm acroForm = pdCatalog.getAcroForm();
COSDictionary acroFormDict = acroForm.getDictionary();
COSArray cosFields = (COSArray) acroFormDict.getDictionaryObject("Fields");
cosFields.clear();

// Phew. Finally.
pdfDocument.save("Some file path");

Probably some typos here and there, but this should be enough to get the gist :)

like image 33
bfjules Avatar answered Sep 22 '22 12:09

bfjules