Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

PyPDF2: Why does PdfFileWriter forget changes I made to a document?

I am trying to modify text in a PDF file. The text can be in an object of type Tj or BDC. I find the correct objects and if I read them directly after changing them they show the updated values.

But if I pass the complete page to PdfFileWriter the change is lost. I might be updating a copy and not the real object. I checked the id() and it was different. Does someone have an idea how to fix this?

from PyPDF2 import PdfFileReader, PdfFileWriter
from PyPDF2.generic import TextStringObject, NameObject, ContentStream
from PyPDF2.utils import b_

reader = PdfFileReader("some.pdf")
writer = PdfFileWriter()

for page_idx in range(0, 1):

    # Get the current page and it's contents
    page = reader.getPage(page_idx)

    content_object = page["/Contents"].getObject()
    content = ContentStream(content_object, reader)

    for operands, operator in content.operations:

        if operator == b_("BDC"):

            operands[1][NameObject("/Contents")] = TextStringObject("xyz")

        if operator == b_("Tj"):

            operands[0] = TextStringObject("xyz")

    writer.addPage(page)


# Write the stream
with open("output.pdf", "wb") as fp:
    writer.write(fp)
like image 465
Joe Avatar asked Sep 25 '18 13:09

Joe


1 Answers

The solution is to assign the ContentStream that is being iterated and changed to the page afterwards before passing it to the PdfFileWriter:

page[NameObject('/Contents')] = content
writer.addPage(page)

I found the solution reading this and this.

like image 190
Joe Avatar answered Nov 15 '22 05:11

Joe