Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Remove Byte Order Mark from signed PDF file?

I am using iTextSharp 5.5.1 in order to sign PDF files digitally with a detached signature (obtained from a third party authority). Everything seems to work fine, the file is valid and e.g. Adobe Reader reports no problems, displays the signatures as valid etc.

The problem is that the Java Clients have apparently some problems with those files - the file can be neither opened nor parsed.
The files have a byte order mark in the header which seems to cause the behavior (\x00EF\x00BB\x00BF).

I could identify the BOM like this:

PdfReader reader = new PdfReader(path);
byte[] metadata = reader.Metadata;
// metadata[0], metadata[1], metadata[2] contain the BOM

How can I either remove the BOM (without losing the validity of the signature), or force the iTextSharp library not to append these bytes into the files?

like image 251
lukasz Avatar asked Oct 09 '14 13:10

lukasz


People also ask

How do I edit an already signed PDF?

Can I edit a PDF that I signed? If you're the only one signer, you can remove the signature and then work on the document or edit the source document. To remove your signature, right-click the signature and then choose Clear Signature.

Can you remove a signature from a signed PDF?

How to clear a signature from a PDF. Provided the document isn't locked, you can then remove your own signature from the PDF by simply right-clicking the signature and choosing the Clear Signature option. This should remove the signature, allowing you to edit or re-sign the PDF.

Can you redact a digitally signed PDF?

How do I redact a signed PDF? First, open the document you wish to redact. ... Selecting this option will reveal the redaction menu at the top of your document. ... Now you simply need to select the text that you want to redact. ... Click Apply to redact.


2 Answers

First things first: once a PDF is signed, you shouldn't change any byte of that PDF, because you invalidate the signature if you do.

Second observation: the byte order mark is not part of the PDF header (a PDF always starts with %PDF-1.). In this context, it is the value of the begin attribute in the processing instruction of XMP metadata. I don't know of any Java client that has a problem with that byte sequence anywhere in a file. If they do have a problem with it, there's a problem with that client, not with the file.

The Byte Order Mark is an indication of the presence of UTF-8 characters. In the context of XMP, we have a stream inside the PDF that contains a clear text XML file that can be consumed by software that is not "PDF aware". For instance:

2 0 obj
<</Type/Metadata/Subtype/XML/Length 3492>>stream
<?xpacket begin="" id="W5M0MpCehiHzreSzNTczkc9d"?>
<x:xmpmeta xmlns:x="adobe:ns:meta/" x:xmptk="Adobe XMP Core 5.1.0-jc003">
  <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
    <rdf:Description rdf:about=""
        xmlns:dc="http://purl.org/dc/elements/1.1/"
        xmlns:pdf="http://ns.adobe.com/pdf/1.3/"
        xmlns:xmp="http://ns.adobe.com/xap/1.0/"
      dc:format="application/pdf"
      pdf:Keywords="Metadata, iText, PDF"
      pdf:Producer="iText® 5.5.4-SNAPSHOT ©2000-2014 iText Group NV (AGPL-version); modified using iText® 5.5.4-SNAPSHOT ©2000-2014 iText Group NV (AGPL-version)"
      xmp:CreateDate="2014-11-07T16:36:55+01:00"
      xmp:CreatorTool="My program using iText"
      xmp:ModifyDate="2014-11-07T16:36:56+01:00"
      xmp:MetadataDate="2014-11-07T16:36:56+01:00">
      <dc:description>
        <rdf:Alt>
          <rdf:li xml:lang="x-default">This example shows how to add metadata</rdf:li>
        </rdf:Alt>
      </dc:description>
      <dc:creator>
        <rdf:Seq>
          <rdf:li>Bruno Lowagie</rdf:li>
        </rdf:Seq>
      </dc:creator>
      <dc:subject>
        <rdf:Bag>
          <rdf:li>Metadata</rdf:li>
          <rdf:li>iText</rdf:li>
          <rdf:li>PDF</rdf:li>
        </rdf:Bag>
      </dc:subject>
      <dc:title>
        <rdf:Alt>
          <rdf:li xml:lang="x-default">Hello World example</rdf:li>
        </rdf:Alt>
      </dc:title>
    </rdf:Description>
  </rdf:RDF>
</x:xmpmeta>                                                            
<?xpacket end="w"?>
endstream

Such non-PDF-aware software will look for the sequence W5M0MpCehiHzreSzNTczkc9d, which is a sequence that is unlikely to appear by accident in a data stream.

The begin attribute is there to indicate that the characters in the stream use UTF-8 encoding. They are there because it is good practice for them to be there, but they are not mandatory (ISO-16684-1).

You could retrieve the metadata the way you do (byte[] metadata = reader.Metadata;), remove the bytes, and change the stream with a PdfStamper instance like this:

 stamper.XmpMetadata = metadata;

After you have changed the metadata, you can sign the PDF.

Note that one aspect of your question surprises me. You write:

// metadata[0], metadata[1], metadata[2] contain the BOM

It is very strange that the first three bytes of the XMP metadata contain the BOM. XMP metadata is suppose to start with <?xpacket. If it doesn't, you are doing the right thing by removing those bytes.

Caveat: a PDF can contain XMP metadata at different levels. Right now, you are examining the most common one: document-level metadata. You may encounter PDFs with page-level XMP metadata, with XMP inside an image, etc...

like image 70
Bruno Lowagie Avatar answered Oct 05 '22 08:10

Bruno Lowagie


Just a quick approach:

First: save both files un-encrypted. Second: remove metadata 0 through 2 before saving the file

There are some considerations however: does the signing method require a BOM? Does the encryption method require a BOM?

You will also have to ascertain at what stage the BOM is added before you can determine whether you can/should remove the BOM.

I will have a quick hunt about for my pdf structure docs and see what I can get, however the simplest way would be (untried) load the whole thing as a byte array and simply remove xEF xBB xBF from the start of the file, then do any signing/encryption. However they may add it in again...

I will post an update over the weekend:)

like image 27
GMasucci Avatar answered Oct 05 '22 09:10

GMasucci