Batch fill PDF forms from python or bash

Tags:

I have a PDF form that needs to be filled out a bunch of times (it's a timesheet to be exact). Now since I don't want to do this by hand, I was looking for a way to fill them out using a python script or tools that could be used in a bash script.

Does anyone have experience with this?

373

asked May 07 '12 03:05

McEnroe

2 Answers

For Python you'll need the fdfgen lib and pdftk

@Hugh Bothwell's comment is 100% correct so I'll extend that answer with a working implementation.

If you're in windows you'll also need to make sure both python and pdftk are contained in the system path (unless you want to use long folder names).

Here's the code to auto-batch-fill a collection of PDF forms from a CSV data file:

import csv
from fdfgen import forge_fdf
import os
import sys

sys.path.insert(0, os.getcwd())
filename_prefix = "NVC"
csv_file = "NVC.csv"
pdf_file = "NVC.pdf"
tmp_file = "tmp.fdf"
output_folder = './output/'

def process_csv(file):
    headers = []
    data =  []
    csv_data = csv.reader(open(file))
    for i, row in enumerate(csv_data):
      if i == 0:
        headers = row
        continue;
      field = []
      for i in range(len(headers)):
        field.append((headers[i], row[i]))
      data.append(field)
    return data

def form_fill(fields):
  fdf = forge_fdf("",fields,[],[],[])
  fdf_file = open(tmp_file,"w")
  fdf_file.write(fdf)
  fdf_file.close()
  output_file = '{0}{1} {2}.pdf'.format(output_folder, filename_prefix, fields[1][1])
  cmd = 'pdftk "{0}" fill_form "{1}" output "{2}" dont_ask'.format(pdf_file, tmp_file, output_file)
  os.system(cmd)
  os.remove(tmp_file)

data = process_csv(csv_file)
print('Generating Forms:')
print('-----------------------')
for i in data:
  if i[0][1] == 'Yes':
    continue
  print('{0} {1} created...'.format(filename_prefix, i[1][1]))
  form_fill(i)

Note: It shouldn't be rocket-surgery to figure out how to customize this. The initial variable declarations contain the custom configuration.

In the CSV, in the first row each column will contain the name of the corresponding field name in the PDF file. Any columns that don't have corresponding fields in the template will be ignored.

In the PDF template, just create editable fields where you want your data to fill and make sure the names match up with the CSV data.

For this specific configuration, just put this file in the same folder as your NVC.csv, NVC.pdf, and a folder named 'output'. Run it and it automagically does the rest.

171

answered Oct 17 '22 21:10

Evan Plaice

Much faster version, no pdftk nor fdfgen needed, pure Python 3.6+:

# -*- coding: utf-8 -*-

from collections import OrderedDict
from PyPDF2 import PdfFileWriter, PdfFileReader


def _getFields(obj, tree=None, retval=None, fileobj=None):
    """
    Extracts field data if this PDF contains interactive form fields.
    The *tree* and *retval* parameters are for recursive use.

    :param fileobj: A file object (usually a text file) to write
        a report to on all interactive form fields found.
    :return: A dictionary where each key is a field name, and each
        value is a :class:`Field<PyPDF2.generic.Field>` object. By
        default, the mapping name is used for keys.
    :rtype: dict, or ``None`` if form data could not be located.
    """
    fieldAttributes = {'/FT': 'Field Type', '/Parent': 'Parent', '/T': 'Field Name', '/TU': 'Alternate Field Name',
                       '/TM': 'Mapping Name', '/Ff': 'Field Flags', '/V': 'Value', '/DV': 'Default Value'}
    if retval is None:
        retval = OrderedDict()
        catalog = obj.trailer["/Root"]
        # get the AcroForm tree
        if "/AcroForm" in catalog:
            tree = catalog["/AcroForm"]
        else:
            return None
    if tree is None:
        return retval

    obj._checkKids(tree, retval, fileobj)
    for attr in fieldAttributes:
        if attr in tree:
            # Tree is a field
            obj._buildField(tree, retval, fileobj, fieldAttributes)
            break

    if "/Fields" in tree:
        fields = tree["/Fields"]
        for f in fields:
            field = f.getObject()
            obj._buildField(field, retval, fileobj, fieldAttributes)

    return retval


def get_form_fields(infile):
    infile = PdfFileReader(open(infile, 'rb'))
    fields = _getFields(infile)
    return OrderedDict((k, v.get('/V', '')) for k, v in fields.items())


def update_form_values(infile, outfile, newvals=None):
    pdf = PdfFileReader(open(infile, 'rb'))
    writer = PdfFileWriter()

    for i in range(pdf.getNumPages()):
        page = pdf.getPage(i)
        try:
            if newvals:
                writer.updatePageFormFieldValues(page, newvals)
            else:
                writer.updatePageFormFieldValues(page,
                                                 {k: f'#{i} {k}={v}'
                                                  for i, (k, v) in enumerate(get_form_fields(infile).items())
                                                  })
            writer.addPage(page)
        except Exception as e:
            print(repr(e))
            writer.addPage(page)

    with open(outfile, 'wb') as out:
        writer.write(out)


if __name__ == '__main__':
    from pprint import pprint

    pdf_file_name = '2PagesFormExample.pdf'

    pprint(get_form_fields(pdf_file_name))

    update_form_values(pdf_file_name, 'out-' + pdf_file_name)  # enumerate & fill the fields with their own names
    update_form_values(pdf_file_name, 'out2-' + pdf_file_name,
                       {'my_fieldname_1': 'My Value',
                        'my_fieldname_2': 'My Another 💎alue'})  # update the form fields

answered Oct 17 '22 21:10

dvska

Related questions
                            
                                AttributeError: 'UUID' object has no attribute 'replace' when using backend-agnostic GUID type
                            
                                How to identify Pandas' backend for Parquet
                            
                                How do I embed a gif in Jupyter notebook?
                            
                                Replace a word in list and append to same list
                            
                                module 'tensorflow._api.v2.train' has no attribute 'GradientDescentOptimizer'
                            
                                Check whether a path exists on a remote host using paramiko
                            
                                Parsing a string which represents a list of tuples
                            
                                Fastest way to take a screenshot with python on windows
                            
                                Common Lisp -- List unpacking? (similar to Python)
                            
                                How to test (using unittest) the HTML output of a Django view?
                            
                                How to create a stock quote fetching app in python
                            
                                Lemmatizing POS tagged words with NLTK?
                            
                                How to fix this AttributeError?
                            
                                Pythonic way to pass keyword arguments on conditional
                            
                                How to make a widget in the center of the screen in PySide/PyQt?
                            
                                Python files - import from each other
                            
                                Is it possible to use bpython as a full debugger?
                            
                                django logging - django.request logger and extra context
                            
                                compare two lists in python and return indices of matched values
                            
                                best Cassandra library/wrapper for Python? [closed]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Batch fill PDF forms from python or bash

Tags:

python

forms

pdf

automation

McEnroe

People also ask

2 Answers

Evan Plaice

dvska

Recent Activity

Donate For Us