I have a PDF file and I am trying to find a specific text in the PDF and highlight it using Python. I found PyPDF2, which can highlight part of a PDF when we give the coordinates of the wanted highlight position in the file.
I am trying to find a tool which can give me the position of a given text in the PDF.
Try running "Preflight..." in Acrobat and choosing PDF Analysis -> List page objects, grouped by type of object . If you locate the text objects within the results list, you will notice there is a position value (in points) within the Text Properties -> * Font section.
In the Search window, select All PDF Documents In. From the pop-up menu directly below this option, choose Browse For Location. Select the location, either on your computer or on a network, and click OK. To specify additional search criteria, click Show Advanced Options, and specify the options.
PyMuPDF can find text by coordinates. You can use this in conjunction with the PyPDF2 highlighting method to accomplish what you're describing. Or you can just use PyMuPDF to highlight the text.
Here is sample code for finding text and highlighting with PyMuPDF:
import fitz ### READ IN PDF doc = fitz.open("input.pdf") for page in doc: ### SEARCH text = "Sample text" text_instances = page.searchFor(text) ### HIGHLIGHT for inst in text_instances: highlight = page.addHighlightAnnot(inst) highlight.update() ### OUTPUT doc.save("output.pdf", garbage=4, deflate=True, clean=True)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With