Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Merge PDF files

Is it possible, using Python, to merge separate PDF files?

Assuming so, I need to extend this a little further. I am hoping to loop through folders in a directory and repeat this procedure.

And I may be pushing my luck, but is it possible to exclude a page that is contained in each of the PDFs (my report generation always creates an extra blank page).

like image 334
Btibert3 Avatar asked Aug 09 '10 22:08

Btibert3


People also ask

How can I merge PDF files free?

Select the files you want to merge using the Acrobat PDF combiner tool. Reorder the files if needed. Click Merge files. Download the merged PDF.

Can I combine PDF files into one?

You can do this by dragging and dropping your PDF files into the box, or you can click the "Select Files" button to choose files you have saved on your computer. Once you've uploaded your PDF files, it's just a matter of clicking the "Merge/Combine" option toward the top right of your screen.

How can I merge PDF files without any software?

Online. Go to http://pdfjoiner.com/ in a web browser. PDF Joiner is a free online tool that allows you to merge multiple PDFs into a single file. Click UPLOAD FILES.


1 Answers

You can use PyPdf2s PdfMerger class.

File Concatenation

You can simply concatenate files by using the append method.

from PyPDF2 import PdfFileMerger  pdfs = ['file1.pdf', 'file2.pdf', 'file3.pdf', 'file4.pdf']  merger = PdfFileMerger()  for pdf in pdfs:     merger.append(pdf)  merger.write("result.pdf") merger.close() 

You can pass file handles instead file paths if you want.

File Merging

If you want more fine grained control of merging there is a merge method of the PdfMerger, which allows you to specify an insertion point in the output file, meaning you can insert the pages anywhere in the file. The append method can be thought of as a merge where the insertion point is the end of the file.

e.g.

merger.merge(2, pdf) 

Here we insert the whole pdf into the output but at page 2.

Page Ranges

If you wish to control which pages are appended from a particular file, you can use the pages keyword argument of append and merge, passing a tuple in the form (start, stop[, step]) (like the regular range function).

e.g.

merger.append(pdf, pages=(0, 3))    # first 3 pages merger.append(pdf, pages=(0, 6, 2)) # pages 1,3, 5 

If you specify an invalid range you will get an IndexError.

Note: also that to avoid files being left open, the PdfFileMergers close method should be called when the merged file has been written. This ensures all files are closed (input and output) in a timely manner. It's a shame that PdfFileMerger isn't implemented as a context manager, so we can use the with keyword, avoid the explicit close call and get some easy exception safety.

You might also want to look at the pdfcat script provided as part of pypdf2. You can potentially avoid the need to write code altogether.

The PyPdf2 github also includes some example code demonstrating merging.

PyMuPdf

Another library perhaps worth a look is PyMuPdf which seems to be actively maintained. Merging is equally simple

From command line:

python -m fitz join -o result.pdf file1.pdf file2.pdf file3.pdf 

and from code

import fitz  result = fitz.open()  for pdf in ['file1.pdf', 'file2.pdf', 'file3.pdf']:     with fitz.open(pdf) as mfile:         result.insertPDF(mfile)      result.save("result.pdf") 

With plenty of options, detailed in the projects wiki.

like image 83
Paul Rooney Avatar answered Oct 15 '22 11:10

Paul Rooney