Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

iText pdf not displaying Chinese characters when using NOTO fonts or Source Hans

Tags:

java

pdf

itext

I am trying to use NOTO fonts (https://www.google.com/get/noto/) to display Chinese characters. Here is my sample code,a modified sample code from iText.

public void createPdf(String filename) throws IOException, DocumentException {

    Document document = new Document();
    PdfWriter.getInstance(document, new FileOutputStream(filename));
    document.open();

    //This is simple English Font
    FontFactory.register("c:/temp/fonts/NotoSerif-Bold.ttf", "my_nato_font");
    Font myBoldFont = FontFactory.getFont("my_nato_font");
    BaseFont bf = myBoldFont.getBaseFont();
    document.add(new Paragraph(bf.getPostscriptFontName(), myBoldFont));


    //This is Chinese font


    //Option 1 :
    Font myAdobeTypekit = FontFactory.getFont("SourceHanSansSC-Regular", BaseFont.IDENTITY_H, BaseFont.NOT_EMBEDDED);

    //Option 2 :
     /*FontFactory.register("C:/temp/AdobeFonts/source-han-sans-1.001R/OTF/SimplifiedChinese/SourceHanSansSC-Regular.otf", "my_hans_font");
     Font myAdobeTypekit = FontFactory.getFont("my_hans_font", BaseFont.IDENTITY_H, BaseFont.EMBEDDED);*/



    document.add(Chunk.NEWLINE);
    document.add(new Paragraph("高興", myAdobeTypekit));
    document.add(Chunk.NEWLINE);

    //simplified chinese
    document.add(new Paragraph("朝辞白帝彩云间", myAdobeTypekit));
    document.add(Chunk.NEWLINE);

    document.add(new Paragraph("高兴", myAdobeTypekit));
    document.add(new Paragraph("The Source Han Sans Traditional Chinese ", myAdobeTypekit));


    document.close();
}

I have downloaded the fonts files on my machine. I am using two approaches

  1. To use the equivalent font family in Adobe

  2. Embed the otf file in pdf

Using approach 1, I would expect the Chinese characters to be displayed in pdf but English text is displayed and it is blank for Chinese characters.

Using approach 2, when I try embedding the fonts with pdf, which is not the path I would like to take, there is error in opening pdf. enter image description here

Update : If I look at this example http://itextpdf.com/examples/iia.php?id=214

and in this code

public void createPdf(String filename, boolean appearances, boolean font)
    throws IOException, DocumentException {
    // step 1
    Document document = new Document();
    // step 2
    PdfWriter writer = PdfWriter.getInstance(document, new FileOutputStream(filename));
    // step 3
    document.open();
    // step 4
    writer.getAcroForm().setNeedAppearances(appearances);
    TextField text = new TextField(writer, new Rectangle(36, 806, 559, 780), "description");
    text.setOptions(TextField.MULTILINE);
    if (font) {
        BaseFont unicode =
            BaseFont.createFont("c:/windows/fonts/arialuni.ttf", BaseFont.IDENTITY_H, BaseFont.NOT_EMBEDDED);
        text.setExtensionFont(BaseFont.createFont());
        ArrayList<BaseFont> list = new ArrayList<BaseFont>();
        list.add(unicode);
        text.setSubstitutionFonts(list);
        BaseFont f= (BaseFont)text.getSubstitutionFonts().get(0);
        System.out.println(f.getPostscriptFontName());

    }
    text.setText(TEXT);

    writer.addAnnotation(text.getTextField());
    // step 5
    document.close();
}

I substitute, c:/windows/fonts/arialuni.ttf with C:/temp/fonts/NotoSansCJKtc-Thin.otf , I do not see the Chinese characters. The text to convert now is

public static final String TEXT = "These are the protagonists in 'Hero', a movie by Zhang Yimou:\n"
    + "\u7121\u540d (Nameless), \u6b98\u528d (Broken Sword), "
    + "\u98db\u96ea (Flying Snow), \u5982\u6708 (Moon), "
    + "\u79e6\u738b (the King), and \u9577\u7a7a (Sky).";
like image 631
vsingh Avatar asked Mar 24 '15 16:03

vsingh


1 Answers

Clearly you are using the wrong font. I have downloaded the fonts from the link you posted. You are using NotoSerif-Bold.ttf, a font that does not support Chinese. However, the ZIP file also contains fonts with CJK in the font name. As described on the site you refer to, CJK stands for Chinese, Japanese and Korean. Use one of those CJK fonts and you'll be able to product Chinese text in your PDF.

Take a look at the NotoExample in which I use one of the fonts from the ZIP file you refer to. It creates a PDF that looks like this:

enter image description here

This is the code I used:

public static final String FONT = "resources/fonts/NotoSansCJKsc-Regular.otf";
public static final String TEXT = "These are the protagonists in 'Hero', a movie by Zhang Yimou:\n"
    + "\u7121\u540d (Nameless), \u6b98\u528d (Broken Sword), "
    + "\u98db\u96ea (Flying Snow), \u5982\u6708 (Moon), "
    + "\u79e6\u738b (the King), and \u9577\u7a7a (Sky).";
public static final String CHINESE = "\u5341\u950a\u57cb\u4f0f";
public static final String JAPANESE = "\u8ab0\u3082\u77e5\u3089\u306a\u3044";
public static final String KOREAN = "\ube48\uc9d1";

public void createPdf(String dest) throws IOException, DocumentException {
    Document document = new Document();
    PdfWriter.getInstance(document, new FileOutputStream(DEST));
    document.open();
    Font font = FontFactory.getFont(FONT, BaseFont.IDENTITY_H, BaseFont.EMBEDDED);
    Paragraph p = new Paragraph(TEXT, font);
    document.add(p);
    document.add(new Paragraph(CHINESE, font));
    document.add(new Paragraph(JAPANESE, font));
    document.add(new Paragraph(KOREAN, font));
    document.close();
}

You claim that Adobe Reader XI doesn't show the Chinese glyphs, but instead shows a "Cannot extract the embedded Font" message. I can not reproduce this [*]. I have even used Preflight in Adobe Acrobat as indicated here, but no errors were found:

enter image description here

[*] Update: this problem can be reproduced if you use iText 4.2.x, a version that was released by somebody unknown to iText Group NV. Please use iText versions higher than 5 only.

like image 153
Bruno Lowagie Avatar answered Oct 19 '22 23:10

Bruno Lowagie