Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

PDF Parsing with Text and Coordinates [closed]

I am currently using PDF Box to parse a pdf and I am trying to figure out how to retrieve data about the text such as the font (bold, size, etc) and the location of the font.

Any suggestions?

like image 798
A. Canyon Avatar asked Feb 03 '23 17:02

A. Canyon


1 Answers

After poking around the (hard to find) PDFBox docs, I found this little gem.

Apparently one of the examples shows exactly how to do everything you asked. Basically, you subclass PdfTextStripper and override the processTextPosition method. There, you query the TextPosition for whatever information you need.

For future reference, you can find the javaDoc here: http://pdfbox.apache.org/apidocs/index.html

Edit 2018-04-02: original link is dead, but example can be found in the SVN repo here.

like image 55
Mark Storer Avatar answered Feb 05 '23 07:02

Mark Storer