Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Make a pdf conforming PDF/A with only images using iTextSharp

I'm using iTextSharp to generate pdf-a documents from images. So far I've not been successful.
Edit: I'm using iTextSharp to generate the PDF

All I try is to make a pdf-a document (1a or 1b, whatever suits), with some images. This is the code I've come up so far, but I keep getting errors when I try to validate them with pdf-tools or validatepdfa.

This are the errors I get from pdf-tools (using PDF/A-1b validation): Edit: MarkInfo and Color Space arn't yet working. The rest is okay

Validating file "0.pdf" for conformance level pdfa-1a
The key MarkInfo is required but missing.
A device-specific color space (DeviceRGB) without an appropriate output intent is used.
The document does not conform to the requested standard.
The document contains device-specific color spaces.
The document doesn't provide appropriate logical structure information.
Done.

Main flow

var output = new MemoryStream();
using (var iccProfileStream = new FileStream("ToPdfConverter/ColorProfiles/sRGB_v4_ICC_preference_displayclass.icc", FileMode.Open))
{
    var document = new Document(new Rectangle(PageSize.A4.Width, PageSize.A4.Height), 0f, 0f, 0f, 0f);
    var pdfWriter = PdfWriter.GetInstance(document, output);
    pdfWriter.PDFXConformance = PdfWriter.PDFA1A;
    document.Open();

    var pdfDictionary = new PdfDictionary(PdfName.OUTPUTINTENT);
    pdfDictionary.Put(PdfName.OUTPUTCONDITION, new PdfString("sRGB IEC61966-2.1"));
    pdfDictionary.Put(PdfName.INFO, new PdfString("sRGB IEC61966-2.1"));
    pdfDictionary.Put(PdfName.S, PdfName.GTS_PDFA1);

    var iccProfile = ICC_Profile.GetInstance(iccProfileStream);
    var pdfIccBased = new PdfICCBased(iccProfile);
    pdfIccBased.Remove(PdfName.ALTERNATE);
    pdfDictionary.Put(PdfName.DESTOUTPUTPROFILE, pdfWriter.AddToBody(pdfIccBased).IndirectReference);

    pdfWriter.ExtraCatalog.Put(PdfName.OUTPUTINTENT, new PdfArray(pdfDictionary));

    var image = PrepareImage(imageBytes);

    document.Open();
    document.Add(image);

    pdfWriter.CreateXmpMetadata();

    pdfWriter.CloseStream = false;
    document.Close();
}
return output.GetBuffer();

This is prepareImage()
It's used to flatten the image to bmp, so I don't need to bother about alpha channels.

private Image PrepareImage(Stream stream)
{
    Bitmap bmp = new Bitmap(System.Drawing.Image.FromStream(stream));
    var file = new MemoryStream();
    bmp.Save(file, ImageFormat.Bmp);
    var image = Image.GetInstance(file.GetBuffer());

    if (image.Height > PageSize.A4.Height || image.Width > PageSize.A4.Width)
    {
        image.ScaleToFit(PageSize.A4.Width, PageSize.A4.Height);
    }
    return image;
}

Can anyone help me into a direction to fix the errors? Specifically the device-specific color spaces

Edit: More explanation: What I'm trying to achieve is, converting scanned images to PDF/A for long-term data storage

Edit: added some files I'm using to test with
PDFs and Pictures.rar (3.9 MB)
https://mega.co.nz/#!n8pClYgL!NJOJqSO3EuVrqLVyh3c43yW-u_U35NqeB0svc6giaSQ

like image 566
Highmastdon Avatar asked Apr 09 '13 08:04

Highmastdon


1 Answers

OK, I checked one of your files in callas pdfToolbox and it says: "Device color space used but no PDF/A output intent". Which I took as a sign that you do something wrong while writing an output intent to the document. I then converted that document to PDF/A-1b with the same tool and the difference is obvious.

Perhaps there are other errors you need to fix, but the first error here is that you put a key in the catalog dict for the PDF file that is named "OutputIntent". That's wrong: page 75 of the PDF Specification states that the key should be named "OutputIntents".

Like I said, perhaps there are other problems with your file beyond this, but the wrong name for the key causes PDF/A validators not to find the Output Intent you try to put in the file...

like image 115
David van Driessche Avatar answered Sep 22 '22 02:09

David van Driessche