I have a pdf page object with an image and a lot of text.
I want to remove that image and remove some text objects based on their contents. That is I want to get all text objects' contents, then remove some of them if they satisfied the condition.
How can I do that with PyPDF2? Or is there another library which allows doing that?
To remove all images from a PDF file using PyPDF2 you can do:
from PyPDF2 import PdfFileWriter, PdfFileReader
inputStream = open("src.pdf", "rb")
outputStream = open("dst.pdf", "wb")
src = PdfFileReader(inputStream)
output = PdfFileWriter()
[output.addPage(src.getPage(i)) for i in range(src.getNumPages())]
output.removeImages()
output.write(outputStream)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With