<p>Is anybody has experience merging two page of PDF file into one using python lib PyPDF2. When I try <code>page1.mergePage(page2)</code> it results with page2 overlayed page1. How to make it to add page2 to the bottom of the page1?</p>

<p>As I'm searching the web for python pdf merging solution, I noticed that there's a general misconception with merging versus appending.</p> <p>Most people call the appending action a merge but it's not. What you're describing in your question is really the intended use of <code>mergePage</code> <em>which should be called <code>applyPageOnTopOfAnother</code> but that's a little long.</em> What you are (were) looking for is really appending two files/pages into a new file.</p> <h3>Appending PDF files</h3> <p>Using the <code>PdfFileMerger</code> class and its <code>append</code> method.</p> <blockquote> <p>Identical to the <code>merge()</code> method, but assumes you want to concatenate all pages onto the end of the file instead of specifying a position.</p> </blockquote> <p>Here's one way to do it taken from pypdf Merging multiple pdf files into one pdf:</p> <pre class="prettyprint"><code>from PyPDF2 import PdfFileMerger, PdfFileReader # ... merger = PdfFileMerger() merger.append(PdfFileReader(file(filename1, 'rb'))) merger.append(PdfFileReader(file(filename2, 'rb'))) merger.write("document-output.pdf") </code></pre> <h3>Appending specific PDF pages</h3> <p>And to append specific pages of different PDF files, use the <code>PdfFileWriter</code> class with the <code>addPage</code> method.</p> <blockquote> <p>Adds a page to this PDF file. The page is usually acquired from a <code>PdfFileReader</code> instance.</p> </blockquote> <pre class="prettyprint"><code>file1 = PdfFileReader(file(filename1, "rb")) file2 = PdfFileReader(file(filename2, "rb")) output = PdfFileWriter() output.addPage(file1.getPage(specificPageIndex)) output.addPage(file2.getPage(specificPageIndex)) outputStream = file("document-output.pdf", "wb") output.write(outputStream) outputStream.close() </code></pre> <h3>Merging two pages into one page</h3> <p>Using <code>mergePage</code></p> <blockquote> <p>Merges the content streams of two pages into one. Resource references (i.e. fonts) are maintained from both pages. The mediabox/cropbox/etc of this page are not altered. The parameter page’s content stream will be added to the end of this page’s content stream, meaning that it will be drawn after, or <strong>“on top”</strong> of this page.</p> </blockquote> <pre class="prettyprint"><code>file1 = PdfFileReader(file(filename1, "rb")) file2 = PdfFileReader(file(filename2, "rb")) output = PdfFileWriter() page = file1.getPage(specificPageIndex) page.mergePage(file2.getPage(specificPageIndex)) output.addPage(page) outputStream = file("document-output.pdf", "wb") output.write(outputStream) outputStream.close() </code></pre>

How to append PDF pages using PyPDF2

Tags:

python

pdf

pdf-generation

pypdf

Is anybody has experience merging two page of PDF file into one using python lib PyPDF2. When I try page1.mergePage(page2) it results with page2 overlayed page1. How to make it to add page2 to the bottom of the page1?

692

asked Apr 01 '14 19:04

Valentin Melnikov

3 Answers

As I'm searching the web for python pdf merging solution, I noticed that there's a general misconception with merging versus appending.

Most people call the appending action a merge but it's not. What you're describing in your question is really the intended use of mergePage which should be called applyPageOnTopOfAnother but that's a little long. What you are (were) looking for is really appending two files/pages into a new file.

Appending PDF files

Using the PdfFileMerger class and its append method.

Identical to the merge() method, but assumes you want to concatenate all pages onto the end of the file instead of specifying a position.

Here's one way to do it taken from pypdf Merging multiple pdf files into one pdf:

from PyPDF2 import PdfFileMerger, PdfFileReader

# ...

merger = PdfFileMerger()

merger.append(PdfFileReader(file(filename1, 'rb')))
merger.append(PdfFileReader(file(filename2, 'rb')))

merger.write("document-output.pdf")

Appending specific PDF pages

And to append specific pages of different PDF files, use the PdfFileWriter class with the addPage method.

Adds a page to this PDF file. The page is usually acquired from a PdfFileReader instance.

file1 = PdfFileReader(file(filename1, "rb"))
file2 = PdfFileReader(file(filename2, "rb"))

output = PdfFileWriter()

output.addPage(file1.getPage(specificPageIndex))
output.addPage(file2.getPage(specificPageIndex))

outputStream = file("document-output.pdf", "wb")
output.write(outputStream)
outputStream.close()

Merging two pages into one page

Using mergePage

Merges the content streams of two pages into one. Resource references (i.e. fonts) are maintained from both pages. The mediabox/cropbox/etc of this page are not altered. The parameter page’s content stream will be added to the end of this page’s content stream, meaning that it will be drawn after, or “on top” of this page.

file1 = PdfFileReader(file(filename1, "rb"))
file2 = PdfFileReader(file(filename2, "rb"))

output = PdfFileWriter()

page = file1.getPage(specificPageIndex)
page.mergePage(file2.getPage(specificPageIndex))

output.addPage(page)

outputStream = file("document-output.pdf", "wb")
output.write(outputStream)
outputStream.close()

196

answered Sep 24 '22 20:09

Emile Bergeron

If the 2 PDFs do not exist on your local machine, and instead are normally accessed/download via a URL (i.e. http://foo/bar.pdf & http://bar/foo.pdf), we can fetch both PDFs from remote locations and merge them together in memory in one-fell-swoop.

This eliminates the assumed step of downloading the PDF to begin with, and allows us to generalize beyond the simple case of both PDFs existing on disk. Specifically, it generalizes the solution to any HTTP-accessible PDF.

The example:

    from PyPDF2 import PdfFileMerger, PdfFileReader

    pdf_content_1 = requests.get('http://foo/bar.pdf').content
    pdf_content_2 = requests.get('http://bar/foo.pdf').content

    # Write to in-memory file-like buffers        
    pdf_buffer_1 = StringIO.StringIO().write(pdf_content_1)
    pdf_buffer_2 = StringIO.StringIO().write(pdf_content_2)
    pdf_merged_buffer = StringIO.StringIO()

    merger = PdfFileMerger()
    merger.append(PdfFileReader(pdf_buffer_1))
    merger.append(PdfFileReader(pdf_buffer_2))
    merger.write(pdf_merged_buffer)

    # Option 1:
    # Return the content of the buffer in an HTTP response (Flask example below)
    response = make_response(pdf_merged_buffer.getvalue())
    # Set headers so web-browser knows to render results as PDF
    response.headers['Content-Type'] = 'application/pdf'
    response.headers['Content-Disposition'] = \ 
        'attachment; filename=%s.pdf' % 'Merged PDF'
    return response 
    # Option 2: Write to disk
    with open("merged_pdf.pdf", "w") as fp:
        fp.write(pdf_merged_buffer.getvalue())

answered Sep 24 '22 20:09

The Aelfinn

Did it this way:

reader = PyPDF2.PdfFileReader(open("input.pdf",'rb'))

NUM_OF_PAGES = reader.getNumPages()

page0 = reader.getPage(0)
h = page0.mediaBox.getHeight()
w = page0.mediaBox.getWidth()

newpdf_page = PyPDF2.pdf.PageObject.createBlankPage(None, w, h*NUM_OF_PAGES)
for i in range(NUM_OF_PAGES):
    next_page = reader.getPage(i)
    newpdf_page.mergeScaledTranslatedPage(next_page, 1, 0, h*(NUM_OF_PAGES-i-1))

writer = PdfFileWriter()
writer.addPage(newpdf_page)

with open('output.pdf', 'wb') as f:
    writer.write(f)

It works when every page has the same height and width. Otherwise, it needs some modifications.

Maybe Emile Bergeron solution is better. Didn't try it.

answered Sep 23 '22 20:09

adsurbum

Related questions
                            
                                Providing test data in Python [duplicate]
                            
                                How to convert decimal string in python to a number? [duplicate]
                            
                                Display tick and cross icons for a property in the Django administration console
                            
                                Transparent PNGs don't retain transparency after being transformed (Django + PIL)
                            
                                Connect to two databases
                            
                                GeoDjango within a NE, SW box
                            
                                How to create identity matrix with numpy
                            
                                Django admin page doesn't show tables of database (djangobook chapter 06)
                            
                                long text as String in python
                            
                                Python : AttributeError: 'NoneType' object has no attribute 'append'
                            
                                Creating a folder with timestamp
                            
                                Get the last inserted id in django
                            
                                Python: Does the set.add() function not add duplicates?
                            
                                Remove preinstalled python from Mac OSX 10.8
                            
                                Splitting letters from numbers within a string
                            
                                Running cmd in python
                            
                                How to change Qtablewidget's specific cells background color in pyqt
                            
                                IMAP get sender name and body text?
                            
                                Detect if image is color, grayscale or black and white with Python/PIL
                            
                                Install wxPython on Mac os Mavericks

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With