I have a ready-made PDF document containing placeholder values in certain areas.
e.g. {{ first_name }}, {{ postcode }}, ...
I need to substitute these values using python.
Any suggestions?
Enable the Auto-Complete option Choose Edit > Preferences (Windows) or Acrobat / Acrobat Reader > Preferences (Mac OS). Select Forms on the left. Under Auto-Complete, choose Basic or Advanced from the menu. Select Remember Numerical Data if you want the Auto-Complete memory to store numbers that you type into forms.
There are a couple of Python libraries using which you can extract data from PDFs. For example, you can use the PyPDF2 library for extracting text from PDFs where text is in a sequential or formatted manner i.e. in lines or forms. You can also extract tables in PDFs through the Camelot library.
It's a somewhat strange way to go about things, as PDFs aren't really designed to be modified. Depending on how those PDFs you have were generated, it may be very hard to do any replacement. You cannot easily alter any formatting, including line breaks, so the only case this is really useful is if you have some sort of form so you know the fields will fit.
pyPdf may allow you to extract the text, but I don't see a function to alter it while writing a second PDF. PDFedit will certainly allow you to make changes, and is scriptable, but I don't know about connecting it to Python. ReportLab only reads PDFs in the plus version, if I'm reading the page Joe Kington linked right.
I would advise reviewing why you have templates in a PDF format, and if you really do need to make changes to them from that, take a look with PDFedit - there's no telling from this description what the structure of your documents are, and it might be very hard to locate the keywords.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With