Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to find x,y location of a text in pdf

Is there any tool to find the X-Y location on a text content in a pdf file ?

like image 545
raki Avatar asked Jan 19 '11 20:01

raki


3 Answers

Docotic.Pdf Library can do it. See C# sample below:

using (PdfDocument doc = new PdfDocument("your_pdf.pdf"))
{
    foreach (PdfTextData textData in doc.Pages[0].Canvas.GetTextData())
        Console.WriteLine(textData.Position + " " + textData.Text);
}
like image 143
Vitaliy Shibaev Avatar answered Oct 12 '22 10:10

Vitaliy Shibaev


Try running "Preflight..." in Acrobat and choosing PDF Analysis -> List page objects, grouped by type of object.

If you locate the text objects within the results list, you will notice there is a position value (in points) within the Text Properties -> * Font section.

like image 44
Orbling Avatar answered Oct 12 '22 10:10

Orbling


TET, the Text Extraction Toolkit from the pdflib family of products can do that. TET has a commandline interface, and it's the most powerful of all text extraction tools I'm aware of. (It can even handle ligatures...)

Geometry
TET provides precise metrics for the text, such as the position on the page, glyph widths, and text direction. Specific areas on the page can be excluded or included in the text extraction, e.g. to ignore headers and footers or margins.

like image 31
Kurt Pfeifle Avatar answered Oct 12 '22 08:10

Kurt Pfeifle