Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Add text to existing PDF document in Python

I'm trying to convert a pdf to the same size as my pdf which is an A4 page.

convert my_pdf.pdf -density 300x300 -page A4 my_png.png

The resulting png file, however, is 595px × 842px which should be the resolution at 72 dpi. I was thinking of using PIL to write some text on some of the pdf fields and convert it back to PDF. But currently the image is coming out wrong.

Edit: I was approaching the problem from the wrong angle. The correct approach didn't include imagemagick at all.

like image 746
Uku Loskit Avatar asked Jul 25 '11 16:07

Uku Loskit


People also ask

How do you add text to a PDF in Python?

read your PDF using PdfFileReader() , we'll call this input. create a new pdf containing your text to add using ReportLab, save this as a string object. read the string object using PdfFileReader() , we'll call this text. create a new PDF object using PdfFileWriter() , we'll call this output.

How do I add text to an existing PDF?

Open the document in the PDF editor. Select Tools > Edit PDF > Add Text.

Can you edit a PDF with Python?

Open a PDF in Python. Insert content at the beginning of the PDF document. Call the 'save()' method, passing the name of the output file with the required extension. Get the edited result.


2 Answers

After searching around some I finally found the solution: It turns out that this was the correct approach after all. Yet, i feel that it wasn't verbose enough. It appears that the poster probably took it from here (same variable names etc).

The idea: create new blank PDF with Reportlab which only contains a text string. Then merge/add it as a watermark using pyPdf.

from pyPdf import PdfFileWriter, PdfFileReader
import StringIO
from reportlab.pdfgen import canvas
from reportlab.lib.pagesizes import letter
packet = StringIO.StringIO()
# create a new PDF with Reportlab
can = canvas.Canvas(packet, pagesize=letter)
can.drawString(100,100, "Hello world")
can.save()

#move to the beginning of the StringIO buffer
packet.seek(0)
new_pdf = PdfFileReader(packet)
# read your existing PDF
existing_pdf = PdfFileReader(file("mypdf.pdf", "rb"))
output = PdfFileWriter()
# add the "watermark" (which is the new pdf) on the existing page
page = existing_pdf.getPage(0)
page.mergePage(new_pdf.getPage(0))
output.addPage(page)
# finally, write "output" to a real file
outputStream = file("/home/joe/newpdf.pdf", "wb")
output.write(outputStream)
outputStream.close()

Hope this helps somebody else.

like image 169
Uku Loskit Avatar answered Sep 28 '22 09:09

Uku Loskit


I just tried the solution above, but I had quite some troubles to get it running in Python3. So, I would like to share my modifications. The adapted code looks as follows:

from PyPDF2 import PdfFileWriter, PdfFileReader
import io
from reportlab.pdfgen import canvas
from reportlab.lib.pagesizes import letter

packet = io.BytesIO()

# create a new PDF with Reportlab
can = canvas.Canvas(packet, pagesize=letter)
can.drawString(100, 100, "Hello world")
can.save()

# move to the beginning of the StringIO buffer
packet.seek(0)
new_pdf = PdfFileReader(packet)
# read your existing PDF
existing_pdf = PdfFileReader(open("mypdf.pdf", "rb"))
output = PdfFileWriter()
# add the "watermark" (which is the new pdf) on the existing page
page = existing_pdf.getPage(0)
page2 = new_pdf.getPage(0)
page.mergePage(page2)
output.addPage(page)
# finally, write "output" to a real file
outputStream = open("newpdf.pdf", "wb")
output.write(outputStream)
outputStream.close()

Now the page.mergePage throws an error. Turns out to be a porting error in pypdf2. Please refer to this question for the solution: Porting to Python3: PyPDF2 mergePage() gives TypeError

like image 26
Werner Trelawney Avatar answered Sep 28 '22 10:09

Werner Trelawney