Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to check / uncheck checkboxes in a PDF with Python (preferably PyPDF2)?

I have the code below

from PyPDF2 import PdfFileReader, PdfFileWriter

d = {
    "Name": "James",
    " Date": "1/1/2016",
    "City": "Wilmo",
    "County": "United States"
}

reader = PdfFileReader("medicareRRF.pdf")
inFields = reader.getFields()
watermark = PdfFileReader("justSign.pdf")

writer = PdfFileWriter()
page = reader.getPage(0)
page.mergePage(watermark.getPage(0))
writer.addPage(page)
written_page = writer.getPage(0)
writer.updatePageFormFieldValues(written_page, d)

Which correctly fills in the PDF with the dictionary (d), but how can I check and uncheck boxes on the PDF? Here is the getField() info for one of the boxes:

u'Are you ok': {'/FT': '/Btn','/Kids': [IndirectObject(36, 0),
IndirectObject(38, 0)],'/T': u'Are you ok','/V': '/No'}

I tried adding {'Are you ok' : '/Yes'} and several other similar ways, but nothing worked.

like image 786
howMuchCheeseIsTooMuchCheese Avatar asked Feb 21 '16 16:02

howMuchCheeseIsTooMuchCheese


People also ask

How do you uncheck a checkbox in Python?

Once the checkbox is selected, we are calling prop() function as prop( "checked", true ) to check the checkbox and prop( "checked", false ) to uncheck the checkbox.

What is PyPDF2?

PyPDF2: It is a python library used for performing major tasks on PDF files such as extracting the document-specific information, merging the PDF files, splitting the pages of a PDF file, adding watermarks to a file, encrypting and decrypting the PDF files, etc.


2 Answers

I came across the same issue, looked in several places, and was disappointed that I couldn't find the answer. After a few frustrating hours looking at my code, the pyPDF2 code, and the Adobe PDF 1.7 spec, I finally figured it out. If you debug into updatePageFormFieldValues, you'll see that it uses only TextStringObjects. Checkboxes are not text fields -- even the /V values are not text fields, which seemed counterintuitive at least to me. Debugging into that function showed me that checkboxes are instead NameObjects so I created my own function to handle them. I create two dicts: one with only text values that I pass to the built-in updatePageFormFieldValues function and a second with only checkbox values. I also set the /AS to ensure visibility (see PDF spec). My function looks like this:

def updateCheckboxValues(page, fields):

    for j in range(0, len(page['/Annots'])):
        writer_annot = page['/Annots'][j].getObject()
        for field in fields:
            if writer_annot.get('/T') == field:
                writer_annot.update({
                    NameObject("/V"): NameObject(fields[field]),
                    NameObject("/AS"): NameObject(fields[field])
                })

However, as far as I can tell, whether you use /1, /On, or /Yes depends on how the form was defined or perhaps what the PDF reader is looking for. For me, /1 worked.

like image 179
rpsip Avatar answered Nov 09 '22 23:11

rpsip


I will like to add on to the answer @rpsip.

from PyPDF2 import PdfReader, PdfWriter
from PyPDF2.generic import NameObject

reader = PdfReader(r"form2.pdf") #where you read the pdf in the same directory
writer = PdfWriter()

page = reader.pages[0] #read page 1 of your pdf
fields = reader.get_fields() 
print (fields) # this is to identify if you can see the form fills in that page

writer.add_page(page) #this line is necessary otherwise the pdf will be corrupted

for i in range(len(page["/Annots"])): #in order to access the "Annots" key 
    print ((page["/Annots"][i].get_object())) #to find out which of the form fills are checkbox or text fill
    if (page["/Annots"][i].get_object())['/FT']=="/Btn" and (page["/Annots"][i].get_object())['/T']=='Check Box3': #this is my filter so that I can filter checkboxes and the checkbox I want i.e. "Check Box 3"
        print (page["/Annots"][i].get_object()) #further check if I got what I wanted as per the filter
        writer_annot = page["/Annots"][i].get_object() 
        writer_annot.update(
        {
            NameObject("/V"): NameObject(
                "/Yes"), #NameObject being only for checkbox, and please try "/Yes" or "/1" or "/On" to see which works
            NameObject("/AS"): NameObject(
                "/Yes" #NameObject being only for checkbox, and please try "/Yes" or "/1" or "/On" to see which works
            )
        }
    )

        with open("filled-out.pdf", "wb") as output_stream:
            writer.write(output_stream) #save the ticked pdf file as another file named "filled-out.pdf"

hoped I helped.

like image 2
hts123 Avatar answered Nov 09 '22 23:11

hts123