Does anybody know of a open source Java library that will do robust diffing of the text parts of pdf files?
Ideally I would like something that would produce a diff in the form of a patch.
Extract the pdf text with http://incubator.apache.org/pdfbox/ and create a diff with http://code.google.com/p/google-diff-match-patch.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With