Remove Byte Order Mark from signed PDF file?

Tags:

I am using iTextSharp 5.5.1 in order to sign PDF files digitally with a detached signature (obtained from a third party authority). Everything seems to work fine, the file is valid and e.g. Adobe Reader reports no problems, displays the signatures as valid etc.

The problem is that the Java Clients have apparently some problems with those files - the file can be neither opened nor parsed.
The files have a byte order mark in the header which seems to cause the behavior (\x00EF\x00BB\x00BF).

I could identify the BOM like this:

PdfReader reader = new PdfReader(path);
byte[] metadata = reader.Metadata;
// metadata[0], metadata[1], metadata[2] contain the BOM

How can I either remove the BOM (without losing the validity of the signature), or force the iTextSharp library not to append these bytes into the files?

251

asked Oct 09 '14 13:10

lukasz

2 Answers

First things first: once a PDF is signed, you shouldn't change any byte of that PDF, because you invalidate the signature if you do.

Second observation: the byte order mark is not part of the PDF header (a PDF always starts with %PDF-1.). In this context, it is the value of the begin attribute in the processing instruction of XMP metadata. I don't know of any Java client that has a problem with that byte sequence anywhere in a file. If they do have a problem with it, there's a problem with that client, not with the file.

The Byte Order Mark is an indication of the presence of UTF-8 characters. In the context of XMP, we have a stream inside the PDF that contains a clear text XML file that can be consumed by software that is not "PDF aware". For instance:

2 0 obj
<</Type/Metadata/Subtype/XML/Length 3492>>stream
<?xpacket begin="ï»¿" id="W5M0MpCehiHzreSzNTczkc9d"?>
<x:xmpmeta xmlns:x="adobe:ns:meta/" x:xmptk="Adobe XMP Core 5.1.0-jc003">
  <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
    <rdf:Description rdf:about=""
        xmlns:dc="http://purl.org/dc/elements/1.1/"
        xmlns:pdf="http://ns.adobe.com/pdf/1.3/"
        xmlns:xmp="http://ns.adobe.com/xap/1.0/"
      dc:format="application/pdf"
      pdf:Keywords="Metadata, iText, PDF"
      pdf:Producer="iTextÂ® 5.5.4-SNAPSHOT Â©2000-2014 iText Group NV (AGPL-version); modified using iTextÂ® 5.5.4-SNAPSHOT Â©2000-2014 iText Group NV (AGPL-version)"
      xmp:CreateDate="2014-11-07T16:36:55+01:00"
      xmp:CreatorTool="My program using iText"
      xmp:ModifyDate="2014-11-07T16:36:56+01:00"
      xmp:MetadataDate="2014-11-07T16:36:56+01:00">
      <dc:description>
        <rdf:Alt>
          <rdf:li xml:lang="x-default">This example shows how to add metadata</rdf:li>
        </rdf:Alt>
      </dc:description>
      <dc:creator>
        <rdf:Seq>
          <rdf:li>Bruno Lowagie</rdf:li>
        </rdf:Seq>
      </dc:creator>
      <dc:subject>
        <rdf:Bag>
          <rdf:li>Metadata</rdf:li>
          <rdf:li>iText</rdf:li>
          <rdf:li>PDF</rdf:li>
        </rdf:Bag>
      </dc:subject>
      <dc:title>
        <rdf:Alt>
          <rdf:li xml:lang="x-default">Hello World example</rdf:li>
        </rdf:Alt>
      </dc:title>
    </rdf:Description>
  </rdf:RDF>
</x:xmpmeta>                                                            
<?xpacket end="w"?>
endstream

Such non-PDF-aware software will look for the sequence W5M0MpCehiHzreSzNTczkc9d, which is a sequence that is unlikely to appear by accident in a data stream.

The begin attribute is there to indicate that the characters in the stream use UTF-8 encoding. They are there because it is good practice for them to be there, but they are not mandatory (ISO-16684-1).

You could retrieve the metadata the way you do (byte[] metadata = reader.Metadata;), remove the bytes, and change the stream with a PdfStamper instance like this:

 stamper.XmpMetadata = metadata;

After you have changed the metadata, you can sign the PDF.

Note that one aspect of your question surprises me. You write:

// metadata[0], metadata[1], metadata[2] contain the BOM

It is very strange that the first three bytes of the XMP metadata contain the BOM. XMP metadata is suppose to start with <?xpacket. If it doesn't, you are doing the right thing by removing those bytes.

Caveat: a PDF can contain XMP metadata at different levels. Right now, you are examining the most common one: document-level metadata. You may encounter PDFs with page-level XMP metadata, with XMP inside an image, etc...

answered Oct 05 '22 08:10

Bruno Lowagie

Just a quick approach:

First: save both files un-encrypted. Second: remove metadata 0 through 2 before saving the file

There are some considerations however: does the signing method require a BOM? Does the encryption method require a BOM?

You will also have to ascertain at what stage the BOM is added before you can determine whether you can/should remove the BOM.

I will have a quick hunt about for my pdf structure docs and see what I can get, however the simplest way would be (untried) load the whole thing as a byte array and simply remove xEF xBB xBF from the start of the file, then do any signing/encryption. However they may add it in again...

I will post an update over the weekend:)

answered Oct 05 '22 09:10

GMasucci

Related questions
                            
                                How to make a Console application work from an empty project in Xamarin studio
                            
                                Win32_PhysicalMedia returns different serial number for non-admin user
                            
                                What are "APPCOMMAND" variables used with P/Invoke?
                            
                                Generating id for Guid-keyed IdentityUser
                            
                                c# compiler error 'Parameter must be input safe. Invalid variance. The type parameter 'T' must be invariantly valid on Expression<TDelegate> '
                            
                                Entity Framework - Efficiently Delete All Child Entities Without Loading Them
                            
                                Poll a webservice using Reactive Extensions and bind the last x results
                            
                                Can I assign a suffix to my custom value type?
                            
                                Interface naming convention for method returning Task [closed]
                            
                                Dynamic display custom Visual Studio VSPackage command on toolbar
                            
                                How to implement LessThan, etc., when building expressions on strings
                            
                                C# CompilerResults GenerateInMemory?
                            
                                Temporarily set DbContext's CommandTimeout
                            
                                Alternate to Dataflow BroadcastBlock with guaranteed delivery
                            
                                Seeking guidance reading .yaml files with C#
                            
                                access sub XML values in sms web service that hasnt value in standard way
                            
                                How to do server side state management in vNext Web Applications
                            
                                AutoMapper: string to nullable int
                            
                                Detect when running inside a catch block
                            
                                IList<mutable_struct> vs mutable_struct[]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Remove Byte Order Mark from signed PDF file?

Tags:

c#

pdf

byte-order-mark

itextsharp

digital-signature

lukasz

People also ask

2 Answers

Bruno Lowagie

GMasucci

Recent Activity

Donate For Us