Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Comparing a signed PDF to an unsigned PDF using document hash

Tags:

c#

pdf

itextsharp

After extensive google searches, I'm starting to wonder if I'm missing the point of digital signatures in some way.

This is fundamentally what I believe I should be able to do in principle, and I'm hoping iTextSharp will allow me:

I'm writing in C# and .NET and using iTextSharp to parse PDF files. I have an unsigned PDF file, and also a signed version of the same file.

I'm aware a digital signature fundamentally hashes the PDF data, encrypts it with a private key, and then part of the verification process is to decrypt this using the public key and ensure the result matches the PDF data when hashed again.

Additionally to this, I want to get this decrypted document hash, and compare it to a document hash generated from my unsigned PDF. This is because I not only want to verify that the signed PDF is authentic, but also that it's the same unsigned PDF I have on record. I suppose I could also do this by comparing the PDF data (without the signature) with my PDF data on record.

I currently haven't worked out how to do any of this! i.e.:

  1. How do I extract PDF data from a signed PDF excluding the signature?
  2. Alternatively how do I generate a hash from an unsigned PDF?
  3. Along with 2., how do I extract a decrypted hash from a PDF signature?

Hope this is clear, and I haven't missed the point somewhere!

like image 217
splidje Avatar asked Aug 22 '12 13:08

splidje


People also ask

How do I compare signed pdfs?

Choose Tools > Compare Files. Click Select File at left to choose the older file version you want to compare. Click Select File at right to choose the newer file version you want to compare. Click Change File and then choose an already opened file or browse to select your desired file.

How do you check if PDF is signed or not?

Click on the Signature Properties button to check signature properties. In the signature Properties window click Show Signers Certificate button. Following screen will be displayed once you click on it. Initially you will get the summary of the Digital Signature in the certificate viewer window.

How do I change a signed PDF to unsigned?

To remove your signature, right-click the signature and then choose Clear Signature. If you got a signed PDF, you can request the signer to remove the signature and share the PDF or send an unsigned copy of the PDF.


1 Answers

About this:

"This is because I not only want to verify that the signed PDF is authentic, but also that it's the same unsigned PDF I have on record"

Assuming you just want to know that a document you get on your server is authentic:

When creating a signed document, you have the choice of signing only one part of the file, or the entire document. You can then use a "whole document" signature, and if the document you get back on your server is "authentic" (which means that the verification of the signature succeeded), then it is for sure the same document you have on record.

It's worth mentioning that there are two types of PDF signatures, approval signatures and certification signatures. From the document Digital Signatures in PDF from Adobe:

(...) approval signatures, where someone signs a document to show consent, approval, or acceptance. A certified document is one that has a certification signature applied by the originator when the document is ready for use. The originator specifies what changes are allowed; choosing one of three levels of modification permitted:

  • no changes
  • form fill-in only
  • form fill-in and commenting

Assuming you want to match certain signed document that you got on your server, with its unsigned equivalent on a database:

For document identification, I would suggest to deal with it separately. Once a document can be opened, a hash (md5 for example) can be created from the concatenation of the decompressed content of all its pages, and then compare it to another similar hash from the original document, (that can be generated once and stored in a database).

The reason I would do it this way is that it will be independent from the type of signature that was used on the document. Even when form fields are edited in a PDF file, or annotations are added, or new signatures are created, the page content is never modified, it will always remain the same.

If you are using iText, you can get a byte array of the page content by using the method PdfReader.getPageContent and use the result for computing a MD5 hash.

The code in Java might look like this:

PdfReader reader = new PdfReader("myfile.pdf");
MessageDigest messageDigest = MessageDigest.getInstance("MD5");
int pageCount = reader.getNumberOfPages(); 
for(int i=1;i <= pageCount; i++)
{
     byte[] buf = reader.getPageContent(i);
     messageDigest.update(buf, 0, buf.length);
}
byte[] hash = messageDigest.digest();

Additionally, if the server receives a file that went out unsigned an came back signed, the signature may refer to just one part of the file and not all. In this scenario, the signature digests might not be enough to identify the file.

From the PDF specification (sections in bold on my account):

Signatures are created by computing a digest of the data (or part of the data) in a document, and storing the digest in the document.(...) There are two defined techniques for computing a reproducible digest of the contents of all or part of a PDF file:

-A byte range digest is computed over a range of bytes in the file, indicated by the the ByteRange entry in the signature dictionary. This range is typically the entire file, including the signature dictionary but excluding the signature value itself (the Contents entry).

-An object digest (PDF 1.5) is computed by selectively walking a subtree of objects in memory, beginning with the referenced object, which is typically the root object. The resulting digest, along with information about how it was computed, is placed in a signature reference dictionary (...).

like image 69
yms Avatar answered Sep 21 '22 10:09

yms