Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Looking for a PDF file parser [closed]

Does anyone know of a PDF file parser that I could use to pull out sections of text from the plaintext pdf file? Specifially I want a way to be able to reliably pull out the section of text specific to annotations?

Delphi, C# RegEx I dont mind.

like image 236
Toby Allen Avatar asked Dec 31 '22 05:12

Toby Allen


1 Answers

The PDF File Parser article on xactpro seems to be exactly what you need. It explains the format of the PDF and comes with full source code for a parser (and another project for visualisation of the model).

The parser uses format-specific terms, but you could easily use the visualiser to learn what to look for.

like image 198
Richard Szalay Avatar answered Jan 04 '23 15:01

Richard Szalay