I'm using iTextSharp to generate pdf-a documents from images. So far I've not been successful.
Edit: I'm using iTextSharp to generate the PDF
All I try is to make a pdf-a document (1a or 1b, whatever suits), with some images. This is the code I've come up so far, but I keep getting errors when I try to validate them with pdf-tools or validatepdfa.
This are the errors I get from pdf-tools (using PDF/A-1b validation): Edit: MarkInfo and Color Space arn't yet working. The rest is okay
Validating file "0.pdf" for conformance level pdfa-1a
The key MarkInfo is required but missing.
A device-specific color space (DeviceRGB) without an appropriate output intent is used.
The document does not conform to the requested standard.
The document contains device-specific color spaces.
The document doesn't provide appropriate logical structure information.
Done.
Main flow
var output = new MemoryStream();
using (var iccProfileStream = new FileStream("ToPdfConverter/ColorProfiles/sRGB_v4_ICC_preference_displayclass.icc", FileMode.Open))
{
var document = new Document(new Rectangle(PageSize.A4.Width, PageSize.A4.Height), 0f, 0f, 0f, 0f);
var pdfWriter = PdfWriter.GetInstance(document, output);
pdfWriter.PDFXConformance = PdfWriter.PDFA1A;
document.Open();
var pdfDictionary = new PdfDictionary(PdfName.OUTPUTINTENT);
pdfDictionary.Put(PdfName.OUTPUTCONDITION, new PdfString("sRGB IEC61966-2.1"));
pdfDictionary.Put(PdfName.INFO, new PdfString("sRGB IEC61966-2.1"));
pdfDictionary.Put(PdfName.S, PdfName.GTS_PDFA1);
var iccProfile = ICC_Profile.GetInstance(iccProfileStream);
var pdfIccBased = new PdfICCBased(iccProfile);
pdfIccBased.Remove(PdfName.ALTERNATE);
pdfDictionary.Put(PdfName.DESTOUTPUTPROFILE, pdfWriter.AddToBody(pdfIccBased).IndirectReference);
pdfWriter.ExtraCatalog.Put(PdfName.OUTPUTINTENT, new PdfArray(pdfDictionary));
var image = PrepareImage(imageBytes);
document.Open();
document.Add(image);
pdfWriter.CreateXmpMetadata();
pdfWriter.CloseStream = false;
document.Close();
}
return output.GetBuffer();
This is prepareImage()
It's used to flatten the image to bmp, so I don't need to bother about alpha channels.
private Image PrepareImage(Stream stream)
{
Bitmap bmp = new Bitmap(System.Drawing.Image.FromStream(stream));
var file = new MemoryStream();
bmp.Save(file, ImageFormat.Bmp);
var image = Image.GetInstance(file.GetBuffer());
if (image.Height > PageSize.A4.Height || image.Width > PageSize.A4.Width)
{
image.ScaleToFit(PageSize.A4.Width, PageSize.A4.Height);
}
return image;
}
Can anyone help me into a direction to fix the errors?
Specifically the device-specific color spaces
Edit: More explanation: What I'm trying to achieve is, converting scanned images to PDF/A for long-term data storage
Edit: added some files I'm using to test with
PDFs and Pictures.rar (3.9 MB)
https://mega.co.nz/#!n8pClYgL!NJOJqSO3EuVrqLVyh3c43yW-u_U35NqeB0svc6giaSQ
OK, I checked one of your files in callas pdfToolbox and it says: "Device color space used but no PDF/A output intent". Which I took as a sign that you do something wrong while writing an output intent to the document. I then converted that document to PDF/A-1b with the same tool and the difference is obvious.
Perhaps there are other errors you need to fix, but the first error here is that you put a key in the catalog dict for the PDF file that is named "OutputIntent". That's wrong: page 75 of the PDF Specification states that the key should be named "OutputIntents".
Like I said, perhaps there are other problems with your file beyond this, but the wrong name for the key causes PDF/A validators not to find the Output Intent you try to put in the file...
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With