I have code that hides parts of the pdf (by just covering it with a white polygon) but the issue with this is, the text is still there, if you ctrl-f you can still find it.
My goal is to actually remove the text from the pdf itself. Using pdfminer I managed to extract the text from the pdf but I don't know if its possible to actually "replace" the text with say just some empty spaces. Is such a thing possible using python? Extracting it isn't enough. I need the text to be removed from the PDF
extractText()) Extract text from the PDF page. pdfFileObj. close() Close the PDF file object. The replacement text would simply be "", as you want to remove all instances / cases of a certain piece of text.
Erase Text in PDFClick on the "Edit" tab on the top right to enable the editing mode. Then click on the text block you want to delete. You can either use the "Backspace" key or press the "Delete" button from your keyboard.
On the PDF file, press “Ctrl+F” on your keyboard and input the text you would like to be replaced. Then type in new text in the input field of Replace to modify the current one to this new text. Click on “Replace” to start replacing PDF texts.
This is kind of memory intensive but you can copy the rest of the pdf apart from the part you are removing and then overwrite the file with the new version which does not contain the part you wish to remove. You can do this using PyPDF by retrieving a content stream and finding and removing the relevant parts.
PyPDF docs https://pythonhosted.org/PyPDF2/PageObject.html?highlight=getcontents#PyPDF2.pdf.PageObject.getContents;
PDF standard https://www.adobe.com/content/dam/acom/en/devnet/pdf/pdfs/PDF32000_2008.pdf pg 78, pg 81;
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With