I am using the following standard code:
# importing required modules
import PyPDF2
def PDFmerge(pdfs, output):
# creating pdf file merger object
pdfMerger = PyPDF2.PdfFileMerger()
# appending pdfs one by one
for pdf in pdfs:
with open(pdf, 'rb') as f:
pdfMerger.append(f)
# writing combined pdf to output pdf file
with open(output, 'wb') as f:
pdfMerger.write(f)
def main():
# pdf files to merge
pdfs = ['example.pdf', 'rotated_example.pdf']
# output pdf file name
output = 'combined_example.pdf'
# calling pdf merge function
PDFmerge(pdfs = pdfs, output = output)
if __name__ == "__main__":
# calling the main function
main()
But when I call this with my 2 pdf files (which just contain some text), it produces an empty pdf file, I am wondering how this may be caused?
The problem is that you're closing the files before the write
.
When you call pdfMerger.append
, it doesn't actually read and process the whole file then; it only does so later, when you call pdfMerger.write
. Since the files you've appended are closed, it reads no data from each of them, and therefore outputs an empty PDF.
This should actually raise an exception, which would have made the problem and the fix obvious. Apparently this is a bug introduced in version 1.26, and it will be fixed in the next version. Unfortunately, while the fix was implemented in July 2016, there hasn't been a next version since May 2016. (See this issue.)
You could install directly off the github master (and hope there aren't any new bugs), or you could continue to wait for 1.27, or you could work around the bug. How? Simple: just keep the files open until the write
is done:
with contextlib.ExitStack() as stack:
pdfMerger = PyPDF2.PdfFileMerger()
files = [stack.enter_context(open(pdf, 'rb')) for pdf in pdfs]
for f in files:
pdfMerger.append(f)
with open(output, 'wb') as f:
pdfMerger.write(f)
The workaround I have found that works uses an instance of PdfFileReader as the object to append.
from PyPDF2 import PdfFileMerger
from PyPDF2 import PdfFileReader
merger = PdfFileMerger()
for f in ['file1.pdf', 'file2.pdf', 'file3.pdf']:
merger.append(PdfFileReader(f), 'rb')
with open('finished_copy.pdf', 'wb') as new_file:
merger.write(new_file)
Hope that helps!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With