Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

PDF-Forms with Unicode chars [closed]

I am currently struggling with withing a PDF form created from a LibreOffice document.

I created it like suggested in the book "iText in Action" and am now trying to pre-fill the embedded form with a few values, that can contain Unicode chars.

This includes a character that consist of base char with an addition combining char (e.G. M̂).

I have tried several different hints I found in in stackoverflow and the book, but I never got a PDF document with a form that works on all platforms: Linux (Okular, Evince, Acrobat DC, macOS Previewer, etc.)

I'm aware that I need to have a font, that covers the chars and embedded the font fully. Below there is the code I used to file the PDF document and the PDF file.

My questions are:

  • Is the different behavior of the PDF readers specification weakness in the PDF specification and I have to live with it?
  • Specially the Linux PDF readers and Acrobat behave badly. Are there known bugs?
  • I'm not very familiar with internals of PDF, so any suggestions? Are the contents of my PDF files ok?
  • Any suggestions on how to improve the code to get better results?

Code to fill the form:

BaseFont uniFont = BaseFont.createFont("./src/main/resources/UnicodeDoc.ttf", BaseFont.IDENTITY_H, BaseFont.EMBEDDED, false, null, null, false);
uniFont.setSubset(false);

// Debugging code...
for (String codepage : uniFont.getCodePagesSupported()) {
    System.out.println("Codepage = " + codepage);
}

FileInputStream fis = new FileInputStream(src);
ByteArrayOutputStream baos = new ByteArrayOutputStream();
PdfReader reader = new PdfReader(fis);
PdfStamper stamper = new PdfStamper(reader, baos);

// Fill all fields in PDF form
String text = "aM\u0302a"; // Same as "aM̂a"
com.itextpdf.text.pdf.AcroFields form = stamper.getAcroFields();
for (String fname : form.getFields().keySet()) {
    System.out.println("form." + fname);
    form.setField(fname, text);
    form.setFieldProperty(fname, "textfont", uniFont, null);
}
form.setGenerateAppearances(true);
form.addSubstitutionFont(uniFont);
stamper.setFormFlattening(false);
stamper.close();
reader.close();
  • Template
  • Template filled
  • Font

Thanks in advance, Mik86

like image 458
Mik86 Avatar asked Jan 27 '18 14:01

Mik86


People also ask

Why is my PDF form no longer fillable?

If you can't type into a form field on a pdf, it may be due to a browser's default viewer for pdfs. Fillable forms require Adobe Acrobat or Acrobat Reader/Acrobat DC to fill them out online or on your computer. Many browsers use a different pdf viewer by default that doesn't support fillable form fields.

How do I unlock fillable fields in a PDF?

To avoid accidental changes to the form field, select Locked in the lower-left corner of the Properties dialog box before you close it. To unlock, click the option again.

How do I hide characters in PDF?

Click on the "Protect" tab and select "Mark for Redaction". Then go to the page where you want to hide text and select the text. You can also use the "Search & Redact" option to search for a particular word and hide it on all pages at once.


1 Answers

I'm not very familiar with internals of PDF, so any suggestions? Are the contents of my PDF files ok?

I'll have to dig into the PDF specification to see if there is something definitively incorrect going on, but to me there does appear to be a confusion.

Firstly, your input Template gives me an error when I attempt to open it in Acrobat, and LiveCycle complains that "UnicodeDoc" must be swapped out for a different font. "UnicodeDoc" is used within the original input file:

enter image description here

Note that the font "UnicodeDoc" is not embedded in your input file. When filling in you create and embed a font, but it looks like you don't overwrite the original (again, not to say this is correct or incorrect):

enter image description here

Without going too much into the inner workings of PDFs the form that is getting filled out still links to the original Font that isn't embedded.

This doesn't necessarily directly address the issue, but if I "fix" your document by removing the font from the original template:

input.pdf

And run it through your code it produces output.pdf which has the correct output in Acrobat and Reader.

Again, this isn't to say your PDF is wrong or iText is wrong in this case as I haven't looked through the entire specification to see what (if any) interaction is expected here, but as it stands the font that you are embedding is not the font that ends up getting used in the form field.

like image 190
Jon Reilly Avatar answered Sep 18 '22 14:09

Jon Reilly