How to replace/delete text from a pdf using python?

Tags:

I have code that hides parts of the pdf (by just covering it with a white polygon) but the issue with this is, the text is still there, if you ctrl-f you can still find it.

My goal is to actually remove the text from the pdf itself. Using pdfminer I managed to extract the text from the pdf but I don't know if its possible to actually "replace" the text with say just some empty spaces. Is such a thing possible using python? Extracting it isn't enough. I need the text to be removed from the PDF

423

asked Sep 15 '18 17:09

Wallace

1 Answers

This is kind of memory intensive but you can copy the rest of the pdf apart from the part you are removing and then overwrite the file with the new version which does not contain the part you wish to remove. You can do this using PyPDF by retrieving a content stream and finding and removing the relevant parts.

PyPDF docs https://pythonhosted.org/PyPDF2/PageObject.html?highlight=getcontents#PyPDF2.pdf.PageObject.getContents;

PDF standard https://www.adobe.com/content/dam/acom/en/devnet/pdf/pdfs/PDF32000_2008.pdf pg 78, pg 81;

151

answered Sep 20 '22 13:09

Xander Bielby

Related questions
                            
                                Asynchronous GPU memory transfer with cupy
                            
                                OSError: [WinError 6] The handle is invalid when calling subprocess from Python 3.6
                            
                                Python write to hdfs file
                            
                                Recurrentshop and Keras: multi-dimensional RNN results in a dimensions mismatch error
                            
                                Using ROIPooling layer with a pretrained ResNet34 model in MxNet-Gluon
                            
                                How to bundle Python for AWS Lambda
                            
                                Running nested functions using numba
                            
                                Assign values to SparseArray in Pandas?
                            
                                Create a text file, and email it without saving it locally
                            
                                Importing the multiarray numpy extension module failed (Just with Anaconda)
                            
                                Why does sqlite3 still use __conform__?
                            
                                How are symbols contained in the libpythonX.X linked to numpy extension dynamic libraries?
                            
                                Draw a border around a matplotlib line
                            
                                Python - Accurate time.sleep
                            
                                Best way to add pandas DataFrame column to row [duplicate]
                            
                                Setuptools: How to make sure file generated by packed code be deleted by pip
                            
                                Interpreting logistic regression feature coefficient values in sklearn
                            
                                Correct way to get output of intermediate layer in Keras model?
                            
                                ForeignKey fields in add/change forms - Django admin
                            
                                Extended dict-like subclass to support casting and JSON dumping without extras

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to replace/delete text from a pdf using python?

Tags:

python

python-3.x

pdf

Wallace

People also ask

1 Answers

Xander Bielby

Recent Activity

Donate For Us