Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Write cyrillic chars into PDF form fields with PDFBox

I am using pdfbox 2.0.5 to fill out form fields of a PDF document using this code:

        doc = PDDocument.load(inputStream);
        PDDocumentCatalog catalog = doc.getDocumentCatalog();
        PDAcroForm form = catalog.getAcroForm();
        for (PDField field : form.getFieldTree()){
            field.setValue("должен");
        }

I get this error: U+0434 ('afii10069') is not available in this font Times-Roman (generic: TimesNewRomanPSMT) encoding: StandardEncoding with differences

The PDF document itself contains cyrillic text which is displayed fine. I have tried using different fonts. For "Arial Unicode MS" it wants to download a 50MB "Adobe Acrobat Reader DC Font Pack". Is this a requirement for cyrillic characters?

Which font do I have to specify in the text field to handle cyrillic (or asian) characters?

Thanks, Ropo

like image 318
ropo Avatar asked Mar 20 '17 12:03

ropo


People also ask

What is PDFBox?

PDFBox is an open-source library which is written in Java. It supports the development and conversion of PDF Documents. PDFBox Library comes as a JAR file. It allows the creation of new PDF documents, manipulation of existing documents, bookmarking PDF and the ability to extract content from PDF documents.

Is PDFBox thread safe?

Is PDFBox thread safe? No! Only one thread may access a single document at a time. You can have multiple threads each accessing their own PDDocument object.

How to unlock fields in PDF?

To avoid accidental changes to the form field, select Locked in the lower-left corner of the Properties dialog box before you close it. To unlock, click the option again.


1 Answers

Adobe handles that by reusing the embedded font file in the {/Ubuntu} font and creates a new font resource from that. Here is a quick hack which can serve as a guide of how to achieve something similar. The code is specific to a sample I've got.

PDDocument doc = PDDocument.load(new File(...));
PDAcroForm acroForm = doc.getDocumentCatalog().getAcroForm();
PDResources formResources = acroForm.getDefaultResources();
PDTrueTypeFont font = (PDTrueTypeFont) formResources.getFont(COSName.getPDFName("Ubuntu"));

// here is the 'magic' to reuse the font as a new font resource
TrueTypeFont ttFont = font.getTrueTypeFont();

PDFont font2 = PDType0Font.load(doc, ttFont, true);
ttFont.close();

formResources.put(COSName.getPDFName("F0"), font2);

PDTextField formField = (PDTextField) acroForm.getField("Text2");
formField.setDefaultAppearance("/F0 0 Tf 0 g");
formField.setValue("öäüинформацию");

doc.save(...);
doc.close();
like image 159
Maruan Sahyoun Avatar answered Sep 16 '22 16:09

Maruan Sahyoun