Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to check a checkbox in PDF file with the same variable name with iText and Java

Tags:

java

pdf

itext

I have been using iText library for Java to fill automatically a PDF Document. The first thing I do is map every field. Once I get every field mapped, I save the variables name into Strings to be easy to be accessible.

So far, so good. The problem is that I have a group of 6 checkboxes with the same variable name. For exemple, they are named topmostSubform[0].Page2[0].p2_cb01[0].

With some tests I could figure out that if I check the first checkbox so the topmostSubform[0].Page2[0].p2_cb01[0] = 1

If I check the second one (that unchecks the first automatically) so topmostSubform[0].Page2[0].p2_cb01[0] = 2

Then topmostSubform[0].Page2[0].p2_cb01[0] = 3 successively until it gets the the number 6 that is the last one.

I am using form.setField("topmostSubform[0].Page2[0].p2_cb01[0]", "1");to fill up the fields. When I fill with the value 1 the first checkbox gets checked, but when I fill with the number 2 that should check the second checkbox it does not work. It does not matter if I choose 2, 3, 4, 5 or 6 it just does not work, the checkboxes stay empty and I can`t check them.

Here a piece of the code:

String _5_1 = "topmostSubform[0].Page2[0].p2_cb01[0]";

AcroFields form = stamper.getAcroFields();

form.setField(_5_1, "3");

Please, I need suggestions.

like image 824
José Mendes Avatar asked Apr 17 '15 15:04

José Mendes


1 Answers

Allow me to quote from ISO-32000-1 section 12.7.3.2 "Field names":

It is possible for different field dictionaries to have the same fully qualified field name if they are descendants of a common ancestor with that name and have no partial field names (T entries) of their own. Such field dictionaries are different representations of the same underlying field; they should differ only in properties that specify their visual appearance. In particular, field dictionaries with the same fully qualified field name shall have the same field type (FT), value (V), and default value (DV).

If we apply this to your question: it is possible for different field dictionaries to have the same name topmostSubform[0].Page2[0].p2_cb01[0]. Such field dictionaries are different representations of the same field and they shall have the same value.

There are two options:

  1. If you have a PDF with field dictionaries with name (topmostSubform[0].Page2[0].p2_cb01[0]) that have different values, then you don't have a valid PDF file: it is in violation with ISO-32000-1, which is the official PDF specification.
  2. Maybe you think that you have check boxes with the same field name and different values, but maybe those check boxes are in reality a radio field with different radio buttons. Maybe you are not using the correct values. Maybe something else is at play. For a SO reader to be able to help you, he'd need to see the PDF file.

If option 1 applies, abandon all hope: you have a bad PDF. Fix it or throw it away. If option 2 applies, please share the PDF.

Update after inspecting the PDF file:

Option 2 applies. You have a hybrid form, which means that the form is described twice inside the PDF, once using AcroForm technology and once using XFA. Please start by reading my answer to the following question: PDFTK and removing the XFA format

When you open the PDF in Adobe Reader, you will notice that the fields act as if they are radio buttons. When you click one, it is selected, but when you click another, it is selected, but the first one is no longer selected.

What you see, is the form as described in XFA, and there are some important differences between the XFA form and the AcroForm description. This isn't an error. It's inherent to hybrid forms.

When you fill out the form using:

form.setField("topmostSubform[0].Page2[0].p2_cb01[0]", "1");

iText fills out the AcroForm correctly, but it fails at filling out the XFA form because iText makes an educated guess (not an accurate guess) as to where the corresponding value should be set in the XFA stream (which is actually expressed in XML). For more details: this is explained in chapter 8 of iText in Action - Second Edition.

What I usually do in such cases is exactly what the person who asked if he could safely throw away the XFA part does: I remove the XFA part:

AcroFields form = stamper.getAcroFields();
form.removeXfa();

This simplifies things dramatically, but it doesn't solve your problem yet. To solve your problem, we need to look inside the PDF:

enter image description here

As you can see in the screen shot (taken from iText RUPS), there are two different descriptions for the form: you have a /Fields array (the AcroForm description) and you have an /XFA part that consists of different streams that, if you join them, form a large XML file.

We also see that where you think there's a single field topmostSubform[0].Page2[0].p2_cb01[0], there are in reality 6 fields:

topmostSubform[0].Page2[0].p2_cb01[0]
topmostSubform[0].Page2[0].p2_cb01[1]
topmostSubform[0].Page2[0].p2_cb01[2]
topmostSubform[0].Page2[0].p2_cb01[3]
topmostSubform[0].Page2[0].p2_cb01[4]
topmostSubform[0].Page2[0].p2_cb01[5]

Now let's take a look inside those fields.

This is field topmostSubform[0].Page2[0].p2_cb01[0]:

enter image description here

This is field topmostSubform[0].Page2[0].p2_cb01[0]:

enter image description here

These are AcroForm check boxes, but there an instruction meant for humans that says: select only one. This instruction can be understood by humans only, not by machines or software.

My first attempt at writing the FillHybridForm example failed because I made a similar error to yours. I didn't look closely enough at the different appearance states. I thought that the On value of topmostSubform[0].Page2[0].p2_cb01[0] was 0, of topmostSubform[0].Page2[0].p2_cb01[1] was 1, and so on. It wasn't... The On value of topmostSubform[0].Page2[0].p2_cb01[0] was 1, of topmostSubform[0].Page2[0].p2_cb01[1] was 2, and so on.

This is how you can fill out all the check boxes:

public void manipulatePdf(String src, String dest) throws DocumentException, IOException {
    PdfReader reader = new PdfReader(src);
    PdfStamper stamper = new PdfStamper(reader, new FileOutputStream(dest));
    AcroFields form = stamper.getAcroFields();
    form.removeXfa();
    form.setField("topmostSubform[0].Page2[0].p2_cb01[0]", "1");
    form.setField("topmostSubform[0].Page2[0].p2_cb01[1]", "2");
    form.setField("topmostSubform[0].Page2[0].p2_cb01[2]", "3");
    form.setField("topmostSubform[0].Page2[0].p2_cb01[3]", "4");
    form.setField("topmostSubform[0].Page2[0].p2_cb01[4]", "5");
    form.setField("topmostSubform[0].Page2[0].p2_cb01[5]", "6");
    stamper.close();
    reader.close();
}

Now all the check boxes are checked. See f8966_filled.pdf:

enter image description here

Of course: being human, we know that we shouldn't do this, because we should treat the fields as if they were radio buttons, but there is no technical reason in the AcroForm description why we couldn't. The logic that prevents us to do so, is only present in the XFA description.

This solves your problem if it is acceptable to throw away the XFA part. It will also solve your problem if it's OK to flatten the form in which case you should add:

stamper.setFormFlattening(true);

If you the above options aren't acceptable, you shouldn't throw away the XFA part, but fill out the AcroForm part as described above and use iText to extract the XML dataset (see datasets in the first screen shot), update it the way the US government expect you to update it, and use iText to put the updates dataset back in the datasets object.

Phew... This is one of the longest answers I ever wrote on StackOverflow.

like image 80
Bruno Lowagie Avatar answered Nov 10 '22 04:11

Bruno Lowagie