I am trying to modify text in a PDF file. The text can be in an object of type Tj
or BDC
. I find the correct objects and if I read them directly after changing them they show the updated values.
But if I pass the complete page to PdfFileWriter the change is lost. I might be updating a copy and not the real object. I checked the id()
and it was different. Does someone have an idea how to fix this?
from PyPDF2 import PdfFileReader, PdfFileWriter
from PyPDF2.generic import TextStringObject, NameObject, ContentStream
from PyPDF2.utils import b_
reader = PdfFileReader("some.pdf")
writer = PdfFileWriter()
for page_idx in range(0, 1):
# Get the current page and it's contents
page = reader.getPage(page_idx)
content_object = page["/Contents"].getObject()
content = ContentStream(content_object, reader)
for operands, operator in content.operations:
if operator == b_("BDC"):
operands[1][NameObject("/Contents")] = TextStringObject("xyz")
if operator == b_("Tj"):
operands[0] = TextStringObject("xyz")
writer.addPage(page)
# Write the stream
with open("output.pdf", "wb") as fp:
writer.write(fp)
The solution is to assign the ContentStream
that is being iterated and changed to the page afterwards before passing it to the PdfFileWriter
:
page[NameObject('/Contents')] = content
writer.addPage(page)
I found the solution reading this and this.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With