Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Writing JUnits for PDF generated by iText

I am curious if anyone has an experience in writing JUnits for PDFs generated in Java (especially by iText). I did a quick search on google and I could not find anything specific. What I was able to do so far was checking that the PDF has been generated, has certain amount of pages and the document is closed. But I was unable to verify the content of the document. Can someone provide an example what they did in the past to achieve such result? Or am I completely wrong and JUnits for my PDFs are overkill? Thanks

like image 590
Tomas Babic Avatar asked May 02 '12 21:05

Tomas Babic


1 Answers

Given that you are using Java I wold look at PDFBox (Apache). What you are asking is quite challenging as your retransformed PDF may not be syntactically identical to your original. You may need to think of roundtripping.

Documents such as PDF may be fragile with respect to comparison. If you find that a comparison fails it may give little indication of where the failure is. A PDF document can be extremely complex (highly branched trees). You may need to look for a canonicalization of the document to compare them (I do this for XML documents).

My guess is that a complete test is overkill and that your current tests are as good as possible at reasonable cost.

UPDATE: I have checked PDFBox for PDDocument.equals(PDDocument) and there is no deep equals method. This suggests they haven't found it worthwhile (it requires recursion over many subnodes). Also there are many real numbers so these will all have to be compared against a tolerance.

The bitmap method may work for humans but is very sensitive to real-number problems - a rounding error will write a bit in a different pixel. It will almost certainly behave differently with a new OS version.

like image 70
peter.murray.rust Avatar answered Oct 05 '22 11:10

peter.murray.rust