Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Merge 2 pdf files giving me an empty pdf

Tags:

python

pdf

pypdf

I am using the following standard code:

# importing required modules
import PyPDF2

def PDFmerge(pdfs, output):
    # creating pdf file merger object
    pdfMerger = PyPDF2.PdfFileMerger()

    # appending pdfs one by one
    for pdf in pdfs:
        with open(pdf, 'rb') as f:
            pdfMerger.append(f)

    # writing combined pdf to output pdf file
    with open(output, 'wb') as f:
        pdfMerger.write(f)

def main():
    # pdf files to merge
    pdfs = ['example.pdf', 'rotated_example.pdf']

    # output pdf file name
    output  = 'combined_example.pdf'

    # calling pdf merge function
    PDFmerge(pdfs = pdfs, output = output)

if __name__ == "__main__":
    # calling the main function
    main()

But when I call this with my 2 pdf files (which just contain some text), it produces an empty pdf file, I am wondering how this may be caused?

like image 523
HolyMonk Avatar asked Jan 28 '23 06:01

HolyMonk


2 Answers

The problem is that you're closing the files before the write.

When you call pdfMerger.append, it doesn't actually read and process the whole file then; it only does so later, when you call pdfMerger.write. Since the files you've appended are closed, it reads no data from each of them, and therefore outputs an empty PDF.

This should actually raise an exception, which would have made the problem and the fix obvious. Apparently this is a bug introduced in version 1.26, and it will be fixed in the next version. Unfortunately, while the fix was implemented in July 2016, there hasn't been a next version since May 2016. (See this issue.)

You could install directly off the github master (and hope there aren't any new bugs), or you could continue to wait for 1.27, or you could work around the bug. How? Simple: just keep the files open until the write is done:

with contextlib.ExitStack() as stack:
    pdfMerger = PyPDF2.PdfFileMerger()
    files = [stack.enter_context(open(pdf, 'rb')) for pdf in pdfs]
    for f in files:
        pdfMerger.append(f)
    with open(output, 'wb') as f:
        pdfMerger.write(f)
like image 67
abarnert Avatar answered Jan 31 '23 21:01

abarnert


The workaround I have found that works uses an instance of PdfFileReader as the object to append.

from PyPDF2 import PdfFileMerger
from PyPDF2 import PdfFileReader
merger = PdfFileMerger()
for f in ['file1.pdf', 'file2.pdf', 'file3.pdf']:
    merger.append(PdfFileReader(f), 'rb')
with open('finished_copy.pdf', 'wb') as new_file:
    merger.write(new_file)

Hope that helps!

like image 30
Lizzy Presland Avatar answered Jan 31 '23 21:01

Lizzy Presland