How to export pdf form fields to xml automatically

Question

I have a pdf file including form fields and need to export the data into a xml file AUTOMATICALLY. Here is a screen of a sample form I created for testing:

enter image description here

Note: It works great exporting it MANUALLY using Acrobat Professional by clicking on Tools > Form > Export Form Data and finally chose xml extension for file output. This is the result I'm getting when I export it manually:

<?xml version="1.0" encoding="UTF-8"?>
<fields>
    <first_name>John</first_name>
    <last_name>Doe</last_name>
</fields>

However, I need to automate it, e.g. with a python script, Java implementation or some command line tools. Any ideas which libraries or tools I could use to export form field data to xml? The tool or library should be open source, that I can integrate it in my workflow.

I already tried python pdfminer library, which helped me to export static parts (like Static form header, First name: and Last name:) of the pdf file: But how to export form field data (in my case the content of the form fields first_name and last_name)??

EDIT: Feel free to download the sample.pdf file here.

jimmyp.smith · Accepted Answer

How about Apache PDFBox? It is open source and could fit your needs, since the website says "Extract forms data from PDF forms or prefill a PDF form."

EDIT: Check out the PrintFields example.

James Kingsbery · Answer

In bash, you can do this (at least with my version of these tools, less 444 and cat 8.13):

less ~/Downloads/sample.pdf | cat

I get output that looks like this:

Static form header

First name:   John

Last name:    Doe

Which you can then parse pretty obviously using Java/Python/awk/whatever.

Of course, alternatively, if you don't want to rely on the behavior of particular versions of these (not sure if they always do this or not), you can look up less's source code to see how it does it.

How to export pdf form fields to xml automatically

Tags:

java

xml

python-2.7

acrobat

pdf-extraction

Michael

2 Answers

jimmyp.smith

James Kingsbery

Recent Activity

Donate For Us

How to export pdf form fields to xml automatically

Tags:

java

xml

python-2.7

acrobat

pdf-extraction

Michael

2 Answers

jimmyp.smith

James Kingsbery

Related questions

Recent Activity

Donate For Us