Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Programmatic Reading of PDFs in C# [closed]

Tags:

c#

pdf

I see many questions and answers about using C# to generate PDF files.
I have a related, but different task.

I have a large number of PDF files already created, and I would like to validate certain parts of the content with Regular Expressions (RegExs). I want to open the PDFs in C#, and be able to read out the text in something approaching a linear fashion.

If headers, footers, any sidebars, etc, get skipped or read out of order, it doesn't matter. I'm just after as much of the main-body text as I can retrieve.

Can you point me towards tools, libraries, API's, etc, that will enable me to programmatically read text in PDF files?

like image 743
abelenky Avatar asked Mar 09 '10 18:03

abelenky


1 Answers

I have used PDFSharp not later than last automn and found it very easy to use in comparison to others. Home page for PDFSharp.

like image 53
Will Marcouiller Avatar answered Oct 12 '22 09:10

Will Marcouiller