Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Find text position in PDF file

Tags:

I have a PDF file and I am trying to find a specific text in the PDF and highlight it using Python. I found PyPDF2, which can highlight part of a PDF when we give the coordinates of the wanted highlight position in the file.

I am trying to find a tool which can give me the position of a given text in the PDF.

like image 976
Simdan Avatar asked Nov 26 '17 14:11

Simdan


People also ask

How do you find the text position in a PDF?

Try running "Preflight..." in Acrobat and choosing PDF Analysis -> List page objects, grouped by type of object . If you locate the text objects within the results list, you will notice there is a position value (in points) within the Text Properties -> * Font section.

How do I search for a particular column in a PDF?

In the Search window, select All PDF Documents In. From the pop-up menu directly below this option, choose Browse For Location. Select the location, either on your computer or on a network, and click OK. To specify additional search criteria, click Show Advanced Options, and specify the options.


1 Answers

PyMuPDF can find text by coordinates. You can use this in conjunction with the PyPDF2 highlighting method to accomplish what you're describing. Or you can just use PyMuPDF to highlight the text.

Here is sample code for finding text and highlighting with PyMuPDF:

import fitz  ### READ IN PDF doc = fitz.open("input.pdf")  for page in doc:     ### SEARCH     text = "Sample text"     text_instances = page.searchFor(text)      ### HIGHLIGHT     for inst in text_instances:         highlight = page.addHighlightAnnot(inst)         highlight.update()   ### OUTPUT doc.save("output.pdf", garbage=4, deflate=True, clean=True) 
like image 195
Cilantro Ditrek Avatar answered Oct 15 '22 18:10

Cilantro Ditrek