I wanna split pdf file using PyPDF2.
All examples in net is too difficult or don't work or always give error "AttributeError: 'PdfFileWriter' object has no attribute 'stream'"
Can someone help with it ? Need separete one pdf with 3 pages into three different files.
I'm starting from that:
pdfFileObj = open(r"D:\BPO\act.pdf", 'rb')
pdfReader = PyPDF2.PdfFileReader(pdfFileObj)
pdfWriter = PyPDF2.PdfFileWriter()
pdfWriter.addPage(pdfReader.getPage(0))
But don't know what to do next :(
EDIT#1
Was try do a loop for spliting and i'm have a problem: PdfFileWriter make 3 files one with one page, second - with two, and third with three. Where is my mistake in following code:
act_sub_pages_name = ['p01.pdf', 'p02.pdf', 'p03.pdf']
with open(r"D:\BPO\act.pdf", 'rb') as act_mls:
reader = PdfFileReader(act_mls)
writer = PdfFileWriter()
if reader.numPages == 3:
counter = 0
for x in range(3):
path = '\\'.join(['D:\\BPO\\act sub pages', act_sub_pages_name[counter]])
counter += 1
writer.addPage(reader.getPage(x))
with open(path, 'wb') as outfile: writer.write(outfile)
Sry for bad English.
EDIT#2
My solution according by Paul Rooney answer:
act_pdf_file = 'D:\\BPO\\act.pdf'
act_sub_pages_name = ['p01.pdf', 'p02.pdf', 'p03.pdf']
def pdf_splitter(index, src_file):
with open(src_file, 'rb') as act_mls:
reader = PdfFileReader(act_mls)
writer = PdfFileWriter()
writer.addPage(reader.getPage(index))
out_file = os.path.join('D:\\BPO\\act sub pages', act_sub_pages_name[index])
with open(out_file, 'wb') as out_pdf: writer.write(out_pdf)
for x in range(3): pdf_splitter(x, act_pdf_file)
With function all works properly but it a little bit harder.
You can use the write
method of the PdfFileWriter
to write out to the file.
from PyPDF2 import PdfFileReader, PdfFileWriter
with open("input.pdf", 'rb') as infile:
reader = PdfFileReader(infile)
writer = PdfFileWriter()
writer.addPage(reader.getPage(0))
with open('output.pdf', 'wb') as outfile:
writer.write(outfile)
You may want to loop over the pages of the input file, create a new writer object, add a single page. Then write out to an ever incrementing filename or have some other scheme for deciding output filename?
I've used a tool called xpdf
for just this sort of task and it works really really well. You can download it here.
It's a command line utility that you can call from python. Make sure it's added to your path so you can call it from the command line.
Here's how you can interface it from python, using subprocess
:
import subprocess
text, _ = subprocess.Popen('pdftotext -fixed 0 -clip D:\\BPO\\act.pdf',
shell=True,
stdout=subprocess.PIPE).communicate()
pages = text.decode('latin-1').split('\f')
Pages are separated by formfeed characters, so you'll get a list of pages.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With