Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

split a multi-page pdf file into multiple pdf files with python?

Tags:

python

pdf

I would like to take a multi-page pdf file and create separate pdf files per page.

I have downloaded reportlab and have browsed the documentation, but it seems aimed at pdf generation. I haven't yet seen anything about processing PDF files themselves.

Is there an easy way to do this in python?

like image 1000
monkut Avatar asked Jan 29 '09 01:01

monkut


People also ask

How do I automate a PDF Split?

Split A PDF File After Every Page Using A Desktop FlowCreate a new flow in Power Automate Desktop and call it Split PDF Document. We want the user to select a PDF file to be split. Add a Display select file dialog action with the title “Select PDF To Split.” Then set another variable called CurrentPage.


4 Answers

from PyPDF2 import PdfFileWriter, PdfFileReader

inputpdf = PdfFileReader(open("document.pdf", "rb"))

for i in range(inputpdf.numPages):
    output = PdfFileWriter()
    output.addPage(inputpdf.getPage(i))
    with open("document-page%s.pdf" % i, "wb") as outputStream:
        output.write(outputStream)

etc.

like image 186
user26294 Avatar answered Oct 14 '22 22:10

user26294


I missed here a solution where you split the PDF to two parts consisting of all pages so I append my solution if somebody was looking for the same:

from PyPDF2 import PdfFileWriter, PdfFileReader

def split_pdf_to_two(filename,page_number):
    pdf_reader = PdfFileReader(open(filename, "rb"))
    try:
        assert page_number < pdf_reader.numPages
        pdf_writer1 = PdfFileWriter()
        pdf_writer2 = PdfFileWriter()

        for page in range(page_number):
            pdf_writer1.addPage(pdf_reader.getPage(page))

        for page in range(page_number,pdf_reader.getNumPages()):
            pdf_writer2.addPage(pdf_reader.getPage(page))

        with open("part1.pdf", 'wb') as file1:
            pdf_writer1.write(file1)

        with open("part2.pdf", 'wb') as file2:
            pdf_writer2.write(file2)

    except AssertionError as e:
        print("Error: The PDF you are cutting has less pages than you want to cut!")
like image 42
DovaX Avatar answered Oct 14 '22 22:10

DovaX


The PyPDF2 package gives you the ability to split up a single PDF into multiple ones.

import os
from PyPDF2 import PdfFileReader, PdfFileWriter

pdf = PdfFileReader(path)
for page in range(pdf.getNumPages()):
    pdf_writer = PdfFileWriter()
    pdf_writer.addPage(pdf.getPage(page))

    output_filename = '{}_page_{}.pdf'.format(fname, page+1)

    with open(output_filename, 'wb') as out:
        pdf_writer.write(out)

    print('Created: {}'.format(output_filename))

Source: https://www.blog.pythonlibrary.org/2018/04/11/splitting-and-merging-pdfs-with-python/

like image 7
Nikita Jain Avatar answered Oct 14 '22 23:10

Nikita Jain


I know that the code is not related to python, however i felt like posting this piece of R code which is simple, flexible and works amazingly. The PDFtools package in R is amazing in splitting merging PDFs at ease.

library(pdftools) #Rpackage
pdf_subset('D:\\file\\20.02.20\\22 GT 2017.pdf',
           pages = 1:51, output = "subset.pdf")
like image 1
sandilya M Avatar answered Oct 14 '22 22:10

sandilya M