Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Apache POI - Docx output issue

I am evaluating apache poi as an option to write docx files. The specific thing I am looking for is to generate content in the docx file in different languages (hindi/marathi to be specific). I am facing the following issue:

When the docx file gets written the "Hindi/Marathi" text is visible as square boxes even though the font "Arial Unicode MS" supports it. The point is that when we check the boxes MS Word displays the font as "Cailbri", even though i have explicitly set the font to "Arial Unicode MS". If i select the boxes in MS Word and then change the font to "Arial Unicode MS" the hindi/marathi words are visible correctly. Any idea why this happens? Please note I am using a development version of POI as the previous stable version did not support setting of font families. Here is the source:

import java.io.FileNotFoundException;
import java.io.FileOutputStream;
import java.io.IOException;
import org.apache.poi.xwpf.usermodel.XWPFDocument;
import org.apache.poi.xwpf.usermodel.XWPFParagraph;
import org.apache.poi.xwpf.usermodel.XWPFRun;

public class CreateDocumentFromScratch 
{

    public static void main(String[] args) 
    {
        XWPFDocument document = new XWPFDocument();

        XWPFParagraph paragraphTwo = document.createParagraph();
        XWPFRun paragraphTwoRunOne = paragraphTwo.createRun();       
        paragraphTwoRunOne.setFontFamily("Arial Unicode MS");
        paragraphTwoRunOne.setText("नसल्यास");


        XWPFParagraph paragraphThree = document.createParagraph();
        XWPFRun paragraphThreeRunOne = paragraphThree.createRun();
        paragraphThreeRunOne.setFontFamily("Arial Unicode MS");
        paragraphThreeRunOne.setText("This is nice");

        FileOutputStream outStream = null;
        try {
            outStream = new FileOutputStream("c:/will/First.doc");
        } catch (FileNotFoundException e) {
            e.printStackTrace();
        }

        try {
            document.write(outStream);
            outStream.close();
        } catch (FileNotFoundException e) {
            e.printStackTrace();
        } catch (IOException e) {
            e.printStackTrace();
        }
    }

}

Any help will be appreciated.

like image 664
Will Avatar asked Feb 13 '12 09:02

Will


1 Answers

To resurrect a very old post; can the OP confirm the version of MS Office that being used? The problem appears to be with MS Office 2003 running on Windows XP. But then it could be on a higher OS version, too.

It would appear that MS Word applies the Mangal font for Hindi script [Encoding standard: Indic: Hindi ISCII 57002 (Devanagari)]. The following link explains this:

https://support.office.com/en-ca/article/Choose-text-encoding-when-you-open-and-save-files-60d59c21-88b5-4006-831c-d536d42fd861

Suggested workaround: From Windows XP Control Panel, select Regional and Language Options. Select Languages. Check the box "Install files for complex script and right-to-left languages (including Thai).

Restart PC.

However, no such problem was observed when opening the file using LibreOffice versions 4.3.5.2 on Windows, and LibreOffice 4.2.7.2 on Linux (Ubuntu).

Used the following libraries: poi-3.10-FINAL-20140208.jar, poi-ooxml-3.10-FINAL-20140208.jar,
poi-ooxml-schemas-3.10-FINAL-20140208.jar, xmlbeans-2.3.0.jar, dom4j-1.6.1.jar, stax-api-1.0.1.jar

like image 52
Satadru Avatar answered Oct 06 '22 23:10

Satadru