PyPDF2 split pdf by pages

Question

I wanna split pdf file using PyPDF2.

All examples in net is too difficult or don't work or always give error "AttributeError: 'PdfFileWriter' object has no attribute 'stream'"

Can someone help with it ? Need separete one pdf with 3 pages into three different files.

I'm starting from that:

pdfFileObj = open(r"D:\BPO\act.pdf", 'rb')
pdfReader = PyPDF2.PdfFileReader(pdfFileObj)
pdfWriter = PyPDF2.PdfFileWriter()
pdfWriter.addPage(pdfReader.getPage(0))

But don't know what to do next :(

EDIT#1

Was try do a loop for spliting and i'm have a problem: PdfFileWriter make 3 files one with one page, second - with two, and third with three. Where is my mistake in following code:

act_sub_pages_name = ['p01.pdf', 'p02.pdf', 'p03.pdf']
with open(r"D:\BPO\act.pdf", 'rb') as act_mls:
    reader = PdfFileReader(act_mls)
    writer = PdfFileWriter()
    if reader.numPages == 3:
        counter = 0
        for x in range(3):
            path = '\'.join(['D:\BPO\act sub pages', act_sub_pages_name[counter]])
            counter += 1
            writer.addPage(reader.getPage(x))
            with open(path, 'wb') as outfile: writer.write(outfile)

Sry for bad English.

EDIT#2

My solution according by Paul Rooney answer:

act_pdf_file = 'D:\BPO\act.pdf'
act_sub_pages_name = ['p01.pdf', 'p02.pdf', 'p03.pdf']

def pdf_splitter(index, src_file):
    with open(src_file, 'rb') as act_mls:
        reader = PdfFileReader(act_mls)
        writer = PdfFileWriter()
        writer.addPage(reader.getPage(index))
        out_file = os.path.join('D:\BPO\act sub pages', act_sub_pages_name[index])
        with open(out_file, 'wb') as out_pdf: writer.write(out_pdf)

for x in range(3): pdf_splitter(x, act_pdf_file)

With function all works properly but it a little bit harder.

Paul Rooney · Accepted Answer

You can use the write method of the PdfFileWriter to write out to the file.

from PyPDF2 import PdfFileReader, PdfFileWriter

with open("input.pdf", 'rb') as infile:

    reader = PdfFileReader(infile)
    writer = PdfFileWriter()
    writer.addPage(reader.getPage(0))

    with open('output.pdf', 'wb') as outfile:
        writer.write(outfile)

You may want to loop over the pages of the input file, create a new writer object, add a single page. Then write out to an ever incrementing filename or have some other scheme for deciding output filename?

cs95 · Answer

I've used a tool called xpdf for just this sort of task and it works really really well. You can download it here.

It's a command line utility that you can call from python. Make sure it's added to your path so you can call it from the command line.

Here's how you can interface it from python, using subprocess:

import subprocess

text, _ = subprocess.Popen('pdftotext -fixed 0 -clip D:\BPO\act.pdf', 
                           shell=True, 
                           stdout=subprocess.PIPE).communicate()

pages = text.decode('latin-1').split('\f')

Pages are separated by formfeed characters, so you'll get a list of pages.

PyPDF2 split pdf by pages

Tags:

python

pypdf2

Acamori

Video Answer

2 Answers

Paul Rooney

cs95

Recent Activity

Donate For Us

PyPDF2 split pdf by pages

Tags:

python

pypdf2

Acamori

Video Answer

2 Answers

Paul Rooney

cs95

Related questions

Recent Activity

Donate For Us