Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Does PDFBox allow to remove one field from AcroForm?

Tags:

java

pdfbox

I am using Apache PDFBox 2.0.8 and trying to remove one field. But can not find the way to do it, like I can do with iText: PdfStamper.getAcroFields().removeField("signature3").

What I am tying to do. Initially I have template PDF with 3 Digital Signatures. In some cases I need just 2 signatures, so it this case I need to remove 3rd signature from the template. And seems like I can't do it with PDFBox, close thing I found is flattening this field, but that problem is if a flatten particular PDField (not whole form, but just one field) - all other signatures are loosing their functionality, looks like they are getting flattened as well. Here is code that does it:

PDDocument document = PDDocument.load(file);
PDDocumentCatalog documentCatalog = document.getDocumentCatalog();
PDAcroForm acroForm = documentCatalog.getAcroForm();

List<PDField> flattenList = new ArrayList<>();
for (PDField field : acroForm.getFieldTree()) {
    if (field instanceof PDSignatureField && "signature3".equals(field.getFullyQualifiedName())) {
        flattenList.add(field);
    }
}

acroForm.flatten(flattenList, true);
document.save(dest);        
document.close();
like image 358
Renat Gatin Avatar asked Dec 13 '25 07:12

Renat Gatin


1 Answers

As Tilman already mentioned in a comment, PDFBox doesn't have a method to remove a field from the field tree. Nonetheless it has methods to manipulate the underlying PDF structure, so one can write such a method oneself, e.g. like this:

PDField removeField(PDDocument document, String fullFieldName) throws IOException {
    PDDocumentCatalog documentCatalog = document.getDocumentCatalog();
    PDAcroForm acroForm = documentCatalog.getAcroForm();

    if (acroForm == null) {
        System.out.println("No form defined.");
        return null;
    }

    PDField targetField = null;

    for (PDField field : acroForm.getFieldTree()) {
        if (fullFieldName.equals(field.getFullyQualifiedName())) {
            targetField = field;
            break;
        }
    }
    if (targetField == null) {
        System.out.println("Form does not contain field with given name.");
        return null;
    }

    PDNonTerminalField parentField = targetField.getParent();
    if (parentField != null) {
        List<PDField> childFields = parentField.getChildren();
        boolean removed = false;
        for (PDField field : childFields)
        {
            if (field.getCOSObject().equals(targetField.getCOSObject())) {
                removed = childFields.remove(field);
                parentField.setChildren(childFields);
                break;
            }
        }
        if (!removed)
            System.out.println("Inconsistent form definition: Parent field does not reference the target field.");
    } else {
        List<PDField> rootFields = acroForm.getFields();
        boolean removed = false;
        for (PDField field : rootFields)
        {
            if (field.getCOSObject().equals(targetField.getCOSObject())) {
                removed = rootFields.remove(field);
                break;
            }
        }
        if (!removed)
            System.out.println("Inconsistent form definition: Root fields do not include the target field.");
    }

    removeWidgets(targetField);

    return targetField;
}

void removeWidgets(PDField targetField) throws IOException {
    if (targetField instanceof PDTerminalField) {
        List<PDAnnotationWidget> widgets = ((PDTerminalField)targetField).getWidgets();
        for (PDAnnotationWidget widget : widgets) {
            PDPage page = widget.getPage();
            if (page != null) {
                List<PDAnnotation> annotations = page.getAnnotations();
                boolean removed = false;
                for (PDAnnotation annotation : annotations) {
                    if (annotation.getCOSObject().equals(widget.getCOSObject()))
                    {
                        removed = annotations.remove(annotation);
                        break;
                    }
                }
                if (!removed)
                    System.out.println("Inconsistent annotation definition: Page annotations do not include the target widget.");
            } else {
                System.out.println("Widget annotation does not have an associated page; cannot remove widget.");
                // TODO: In this case iterate all pages and try to find and remove widget in all of them
            }
        }
    } else if (targetField instanceof PDNonTerminalField) {
        List<PDField> childFields = ((PDNonTerminalField)targetField).getChildren();
        for (PDField field : childFields)
            removeWidgets(field);
    } else {
        System.out.println("Target field is neither terminal nor non-terminal; cannot remove widgets.");
    }
}

(RemoveField helper methods removeField and removeWidgets)

One can apply this to a document and field like this:

PDDocument document = PDDocument.load(SOURCE_PDF);

PDField field = removeField(document, "Signature1");
Assert.assertNotNull("Field not found", field);

document.save(TARGET_PDF);        
document.close();

(RemoveField test testRemoveInvisibleSignature)


PS: I am not sure how much form related information PDFBox actually caches somewhere. Thus, I would propose not to manipulate the form information any further in the same document manipulation session, at least not without tests.

PPS: You find a TODO in the removeWidgets helper method. If the method outputs "Widget annotation does not have an associated page; cannot remove widget", you'll have to add the missing code.

like image 158
mkl Avatar answered Dec 14 '25 19:12

mkl