Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to delete pages from pdf file using Python?

Tags:

python

pdf

I have some .pdf files with more than 500 pages, but I need only a few pages in each file. It is necessary to preserve document`s title pages. I know exactly the numbers of the pages that program should remove. How I can do it using Python 2.7 Environment, which is installed upon MS Visual Studio?

like image 733
Alexander Avatar asked Sep 19 '16 13:09

Alexander


People also ask

How do I delete pages from a PDF in Python?

PyMuPDF library makes the code easy to delete pages from any PDF file. We can delete a single page as well as multiple pages from PDF. We can also use the list to delete pages from PDF. At first, we will import the 'Fitz' library from the package.

How do I delete entire pages from a PDF?

Choose “Tools” > “Organize Pages.” Or, select “Organize Pages” from the right pane. Select pages to delete: Click the page thumbnail of any page or pages you want to delete, then click the “Delete” icon to remove the page or pages from the file.

Can Python edit PDF?

There are a lot of different kinds of data to decode when opening a PDF file! Fortunately, the Python ecosystem has some great packages for reading, manipulating, and creating PDF files.


1 Answers

Try using PyPDF2.

Instead of deleting pages, create a new document and add all pages which you don't want to delete.

Some sample code (originally adapted from BinPress which is dead, archived here).

from PyPDF2 import PdfFileWriter, PdfFileReader pages_to_keep = [1, 2, 10] # page numbering starts from 0 infile = PdfFileReader('source.pdf', 'rb') output = PdfFileWriter()  for i in pages_to_keep:     p = infile.getPage(i)     output.addPage(p)  with open('newfile.pdf', 'wb') as f:     output.write(f) 

or

from PyPDF2 import PdfFileWriter, PdfFileReader pages_to_delete = [3, 4, 5] # page numbering starts from 0 infile = PdfFileReader('source.pdf', 'rb') output = PdfFileWriter()  for i in range(infile.getNumPages()):     if i not in pages_to_delete:         p = infile.getPage(i)         output.addPage(p)  with open('newfile.pdf', 'wb') as f:     output.write(f) 
like image 171
Maximilian Peters Avatar answered Oct 02 '22 16:10

Maximilian Peters