Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Any way to highlight pdf document for given words via Python?

there are some keywords I am gotten before and I want to search on pdf document via python and highlight them. Is it viable with some library like pdfMiner?

like image 935
erogol Avatar asked Sep 09 '13 00:09

erogol


1 Answers

Yes, you can use 'PyMuPDF' library. pip install PyMuPDF.

Then use the following code,

import fitz

### READ IN PDF

doc = fitz.open(r"D:\XXXX\XXX.pdf")
page = doc[0]

text = "Amey"
text_instances = page.searchFor(text)

### HIGHLIGHT

for inst in text_instances:
    print(inst, type(inst))
    highlight = page.addHighlightAnnot(inst)


### OUTPUT

doc.save(r"D:\XXXX\XXX.pdf", garbage=4, deflate=True, clean=True)
like image 53
Amey P Naik Avatar answered Sep 18 '22 01:09

Amey P Naik