Find text position in PDF file

Tags:

I have a PDF file and I am trying to find a specific text in the PDF and highlight it using Python. I found PyPDF2, which can highlight part of a PDF when we give the coordinates of the wanted highlight position in the file.

I am trying to find a tool which can give me the position of a given text in the PDF.

976

asked Nov 26 '17 14:11

Simdan

1 Answers

PyMuPDF can find text by coordinates. You can use this in conjunction with the PyPDF2 highlighting method to accomplish what you're describing. Or you can just use PyMuPDF to highlight the text.

Here is sample code for finding text and highlighting with PyMuPDF:

import fitz  ### READ IN PDF doc = fitz.open("input.pdf")  for page in doc:     ### SEARCH     text = "Sample text"     text_instances = page.searchFor(text)      ### HIGHLIGHT     for inst in text_instances:         highlight = page.addHighlightAnnot(inst)         highlight.update()   ### OUTPUT doc.save("output.pdf", garbage=4, deflate=True, clean=True)

195

answered Oct 15 '22 18:10

Cilantro Ditrek

Related questions
                            
                                How to make use of Kubernetes port names?
                            
                                Cannot change dependencies of configuration ':app:api' after it has been included in dependency resolution
                            
                                How to create a USDZ file?
                            
                                How to create and write to Excel file (.xlsx)?
                            
                                Why does this compile when passing a lambda in direct initialization and assignment but not with copy initialization?
                            
                                ActionResult<IEnumerable<T>> has to return a List<T>
                            
                                Flutter custom Google Map marker info window
                            
                                When should we use the RxJS tap operator?
                            
                                Stuck at npm install at fechMetadata checking installable status
                            
                                In Kubernetes, what is the difference between ResourceQuota vs LimitRange objects
                            
                                How to cluster similar sentences using BERT
                            
                                Unexpected result with right shift after bitwise negation

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With