Is anybody has experience merging two page of PDF file into one using python lib PyPDF2.
When I try page1.mergePage(page2)
it results with page2 overlayed page1. How to make it to add page2 to the bottom of the page1?
Open Acrobat to combine files: Open the Tools tab and select "Combine files." Add files: Click "Add Files" and select the files you want to include in your PDF. You can merge PDFs or a mix of PDF documents and other files.
Insert one PDF into anotherIn the secondary toolbar, choose Insert > From File. Alternatively, you can right-click a page and select Insert Pages to get the insert options. Select the PDF you want to insert and click Open.
Merge two PDF files using Python In order to perform PDF merging in Python we will need to import the PdfFileMerger() class from the PyPDF2 library, and create an instance of this class. In this example we will merge two files: sample_page1. pdf and sample_page2.
As I'm searching the web for python pdf merging solution, I noticed that there's a general misconception with merging versus appending.
Most people call the appending action a merge but it's not. What you're describing in your question is really the intended use of mergePage
which should be called applyPageOnTopOfAnother
but that's a little long. What you are (were) looking for is really appending two files/pages into a new file.
Using the PdfFileMerger
class and its append
method.
Identical to the
merge()
method, but assumes you want to concatenate all pages onto the end of the file instead of specifying a position.
Here's one way to do it taken from pypdf Merging multiple pdf files into one pdf:
from PyPDF2 import PdfFileMerger, PdfFileReader
# ...
merger = PdfFileMerger()
merger.append(PdfFileReader(file(filename1, 'rb')))
merger.append(PdfFileReader(file(filename2, 'rb')))
merger.write("document-output.pdf")
And to append specific pages of different PDF files, use the PdfFileWriter
class with the addPage
method.
Adds a page to this PDF file. The page is usually acquired from a
PdfFileReader
instance.
file1 = PdfFileReader(file(filename1, "rb"))
file2 = PdfFileReader(file(filename2, "rb"))
output = PdfFileWriter()
output.addPage(file1.getPage(specificPageIndex))
output.addPage(file2.getPage(specificPageIndex))
outputStream = file("document-output.pdf", "wb")
output.write(outputStream)
outputStream.close()
Using mergePage
Merges the content streams of two pages into one. Resource references (i.e. fonts) are maintained from both pages. The mediabox/cropbox/etc of this page are not altered. The parameter page’s content stream will be added to the end of this page’s content stream, meaning that it will be drawn after, or “on top” of this page.
file1 = PdfFileReader(file(filename1, "rb"))
file2 = PdfFileReader(file(filename2, "rb"))
output = PdfFileWriter()
page = file1.getPage(specificPageIndex)
page.mergePage(file2.getPage(specificPageIndex))
output.addPage(page)
outputStream = file("document-output.pdf", "wb")
output.write(outputStream)
outputStream.close()
If the 2 PDFs do not exist on your local machine, and instead are normally accessed/download via a URL (i.e. http://foo/bar.pdf & http://bar/foo.pdf), we can fetch both PDFs from remote locations and merge them together in memory in one-fell-swoop.
This eliminates the assumed step of downloading the PDF to begin with, and allows us to generalize beyond the simple case of both PDFs existing on disk. Specifically, it generalizes the solution to any HTTP-accessible PDF.
The example:
from PyPDF2 import PdfFileMerger, PdfFileReader
pdf_content_1 = requests.get('http://foo/bar.pdf').content
pdf_content_2 = requests.get('http://bar/foo.pdf').content
# Write to in-memory file-like buffers
pdf_buffer_1 = StringIO.StringIO().write(pdf_content_1)
pdf_buffer_2 = StringIO.StringIO().write(pdf_content_2)
pdf_merged_buffer = StringIO.StringIO()
merger = PdfFileMerger()
merger.append(PdfFileReader(pdf_buffer_1))
merger.append(PdfFileReader(pdf_buffer_2))
merger.write(pdf_merged_buffer)
# Option 1:
# Return the content of the buffer in an HTTP response (Flask example below)
response = make_response(pdf_merged_buffer.getvalue())
# Set headers so web-browser knows to render results as PDF
response.headers['Content-Type'] = 'application/pdf'
response.headers['Content-Disposition'] = \
'attachment; filename=%s.pdf' % 'Merged PDF'
return response
# Option 2: Write to disk
with open("merged_pdf.pdf", "w") as fp:
fp.write(pdf_merged_buffer.getvalue())
Did it this way:
reader = PyPDF2.PdfFileReader(open("input.pdf",'rb'))
NUM_OF_PAGES = reader.getNumPages()
page0 = reader.getPage(0)
h = page0.mediaBox.getHeight()
w = page0.mediaBox.getWidth()
newpdf_page = PyPDF2.pdf.PageObject.createBlankPage(None, w, h*NUM_OF_PAGES)
for i in range(NUM_OF_PAGES):
next_page = reader.getPage(i)
newpdf_page.mergeScaledTranslatedPage(next_page, 1, 0, h*(NUM_OF_PAGES-i-1))
writer = PdfFileWriter()
writer.addPage(newpdf_page)
with open('output.pdf', 'wb') as f:
writer.write(f)
It works when every page has the same height and width. Otherwise, it needs some modifications.
Maybe Emile Bergeron solution is better. Didn't try it.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With