Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Convert PDF to PDF/A3 or PDF/A-1 to PDF/A-3

I'm testing iTextSharp to generate ZUGFeRD-Files. My first step was to generate a ZUGFeRD conform file from an existing PDF/A-3 file. This was successfull by using PDFACopy and creating the necessary PDFFileSpecification.

The next step would be to generate a PDF/A-3 file from an existing PDF or PDF/A-1 file and this is the hard part.

First, when I'm trying to use PDFACopy in combination with a regular PDF (not PDF/A) im getting an error that PDFACopy can only be used with PDF/A-conform files. My first question is, how to get an PDF/A-3-conform file from a PDF with iTextSharp?

To reduce the gap, I decided to convert the PDF into PDF/A-1 file with ghostscript (cf. How to use ghostscript to convert PDF to PDF/A or PDF/X?). This was succesfull and I tried again. Then the error "Different PDF/A version." was thrown. It seems that I can't copy from existing PDF/A-1 into a new PDF/A-3. How can I create this PDF/A-3 from an existing PDF(/A-1)? Is that even possible?

Here is my code:

XmlDocument xmlDoc = new XmlDocument();
        xmlDoc.Load(XML);
        byte[] xmlBytes = Encoding.Default.GetBytes(xmlDoc.OuterXml);

        Document doc = new Document();
        PdfReader src_reader = new PdfReader(pdfPath);    

        FileStream fs = new FileStream(DEST, FileMode.Create, FileAccess.ReadWrite);

        PdfACopy aCopy = new PdfACopy(doc, fs, PdfAConformanceLevel.ZUGFeRD);

        doc.AddLanguage("de-DE");
        doc.AddTitle("title");
        doc.SetPageSize(src_reader.GetPageSizeWithRotation(1));

        aCopy.SetTagged();
        aCopy.UserProperties = true;
        aCopy.PdfVersion = PdfCopy.VERSION_1_7;
        aCopy.ViewerPreferences = PdfCopy.DisplayDocTitle;
        aCopy.CreateXmpMetadata();
        aCopy.XmpWriter.SetProperty(PdfAXmpWriter.zugferdSchemaNS, PdfAXmpWriter.zugferdDocumentFileName, "ZUGFeRD-invoice.xml");

        //Ab hier können keine Metadaten mehr geschrieben werden
        doc.Open();

        ICC_Profile icc = ICC_Profile.GetInstance(new FileStream(ICM, FileMode.Open));
        aCopy.SetOutputIntents("Custom", "", "http://www.color.org", "sRGB IEC61966-2.1", icc);

        [...add the dictionary to doc..]
        aCopy.AddDocument(src_reader);
        doc.Close();

One more question: addDocument works, but when I'm using copy.addPage(copy.getImportedPage(src_reader, i)), an error "the document has no pages" will be thrown. WHY?

like image 882
AndreasGloeckner Avatar asked Mar 13 '23 22:03

AndreasGloeckner


1 Answers

1. Can you convert a regular PDF to a PDF/A document?

The answer is: it depends.

PDF/A is a subset of PDF and involves some obligations (e.g. all fonts must be embedded) and restrictions (e.g. no Javascript is allowed). iText can't "automatically" convert a regular PDF to a PDF/A for a number of reasons. For instance: if a font is not embedded, iText doesn't know which font to use to replace the unembedded font, nor where to find the necessary font program. Usually this requires human interaction because replacing one font by an arbitrary other font usually results in very ugly PDFs.

The answer is: it depends because some people are using iText to convert PDF to PDF/A, but this involves a lot of programming and human decisions. I see that you succeed when using GhostScript. In that case, GhostScript is making some decisions in your place. This can lead to acceptable results. In some cases, the result will not be acceptable (e.g. very odd-looking PDFs if the fonts don't match).

2. Can you convert a PDF/A-1 file to a PDF/A-3 file?

The PDF/A standard is written in such a way that old versions of the PDF/A specification are never outdated. Newer versions only add newer functionality. For instance: PDF/A-1 was based on the PDF 1.4 specification. Optional Content functionality (OCG) was introduced in PDF 1.5. The introduction of OCG is one of the differences between PDF/A-2 and PDF/A-1.

This means that every file that conforms to PDF/A-1 automatically conforms to PDF/A-2. However, a PDF/A-2 file could contain functionality that isn't supported in PDF/A-1.

3. What is the difference between PDF/A-2 and PDF/A-3?

PDF/A-2 and PDF/A-3 are identical, except for one difference: a PDF/A-3 file can have attachments that aren't PDF/A files. For instance: a PDF/A-3 file can have a Word file as attachment, an XLS file, a plain text file,... You mention ZUGFeRD: in that case, the PDF/A-3 file has at least an XML file as attachment.

Summarized:

This is a broad answer to a broad question (your question goes in many different directions, so it's hard to give you a specific answer). Why don't you use the already built-in ZUGFeRD support to create the invoices? Read ZUGFeRD, the future of invoicing for more info.

like image 196
Bruno Lowagie Avatar answered Mar 23 '23 09:03

Bruno Lowagie