Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Reliable way to (programmatically) compare PDFs? [duplicate]

Possible Duplicate:
Tool to compare large numbers of PDF files?

I am in the classic scenario where the business gives you a bunch of new pdf forms for the new year with no revision notes whatsoever and you are supposed to figure out what's different from the previous year ones.

I am talking loads of forms here, so I am trying to find a way to compare PDFs to outline differences without having people to manually go through each and every one of them.

My idea was to extract all the text from the PDFs and dump it into a .txt then run differences on text files, but it sounds horrible.

My question says programmatically, but I'd be happy with any reliable tools for comparing PDFs, and mainly looking to get an idea from people experiences. Also willing to entertain any programmatic solutions (preferably in C# but pls shoot out any ideas).

like image 814
JohnIdol Avatar asked Sep 30 '10 21:09

JohnIdol


People also ask

Is there a PDF comparison tool?

PDF comparison is easy with Adobe Acrobat. The Compare Files tool helps you quickly and accurately detect differences between two versions of a PDF. You can compare documents in a side-by-side view, or choose single page view to review all changes in your latest PDF document.

Can WinMerge compare PDF files?

My favorite solution for now is WinMerge (yes, it can compare files), accompanied with the xdocdiff plugin that enhances WinMerge with the understanding of the “inner works” of several popular document types.

Can beyond compare compare PDF files?

Beyond Compare 2 cannot handle Word, Excel, or PDF files natively, and will display garbage if you try to open . doc, . xls, or . pdf files in the File Viewer.


1 Answers

I am a developer of Docotic.Pdf Library. We use PDF comparison in unit tests for checking that test produces PDF as expected. PDF is a collection of special objects and we compare all PDF objects ignoring some properties like trailer IDs and creator info. This implementation works fine.

You can try the method PdfDocument.DocumentsAreEqual. This method just tell you are documents equal, without specific differences. You may contact us if you need more functionality.

like image 151
Vitaliy Shibaev Avatar answered Oct 06 '22 14:10

Vitaliy Shibaev