Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to create preview image from Microsoft document using java

Currently, I am working on Microsoft document : Word (doc, docx), Powerpoint (ppt, pptx), and Excel (xls, xlsx)

I would like to create the a preview image from it's first page.

Only PowerPoint document can be done by Apache-poi library.

But I cannot find the solution for other types.

I have got an idea to convert the document to pdf (1) and the convert to image (2) .

For step 2 (convert pdf to image), there are many free java libraries e.g. PDFBox. It work fine with my dummy pdf file

However, I have a problem in Step 1

In my document, it may contains text with several styles, tables, images, or objects. Sample image from first page of word document:

Sample image from first page of word document

Which open source java library can do this task?

I have tried to implement with following libraries:

JODConverter - The output look fine, but it requires OpenOffice.

docx4j - I'm not sure whether it can work with non ooxml format (doc, xls, ppt) and it really free? Following is example code:

String inputWordPath = "C:\\Users\\test\\Desktop\\TestPDF\\Docx.docx";
String outputPDFPath = "C:\\Users\\test\\Desktop\\TestPDF\\OutDocx4j.pdf";
try {
    InputStream is = new FileInputStream(new File(inputWordPath));
    WordprocessingMLPackage wordMLPackage = WordprocessingMLPackage.load(is);
    Mapper fontMapper = new IdentityPlusMapper();
    wordMLPackage.setFontMapper(fontMapper);
    Docx4J.toPDF(wordMLPackage, new FileOutputStream(new File(outputPDFPath)));
} catch (Exception e) {
    e.printStackTrace();
}

The output look ok but it contains "## Evaluation Use only ##" in generated pdf.

xdocreport - The generated pdf does not contain image.

String inputWordPath = "C:\\Users\\test\\Desktop\\TestPDF\\Docx.docx";
String outputPDFPath = "C:\\Users\\test\\Desktop\\TestPDF\\OutXDOCReport.pdf";
InputStream is = new FileInputStream(new File(inputWordPath));
XWPFDocument document = new XWPFDocument(is);
PdfOptions options = PdfOptions.create();
OutputStream out = new FileOutputStream(new File(outputPDFPath));
PdfConverter.getInstance().convert(document, out, options);

I can not find the suitable library for the task.

  • Do you have any suggestion?

  • Can I convert document (docx, doc, xlsx, xls) to image directly?

  • Is docx4j really free on conversion feature?

  • How to remove "## Evaluation Use only ##" from generated pdf (by docx4j)?

  • Can docx4j work with non ooxml document?

  • Can I convert only first page to pdf?

  • Can I set size of pdf to fit with converted document content?

  • Are there any library and example code to convert document to pdf or convert document to image?

like image 587
nilez Avatar asked Nov 27 '17 11:11

nilez


2 Answers

If you can afford having a LibreOffice (or Apache OpenOffice) installation, JODConverter should do the trick just fine (and for free).

Note that the latest version of JODConverter available in the Maven Central Repository offers a feature, called Filters that would allow you to convert only the first page easily, and it supports conversion to PNG out of the box. Here's a quick example on how to do so:

// Create an office manager using the default configuration.
// The default port is 2002. Note that when an office manager
// is installed, it will be the one used by default when
// a converter is created.
final LocalOfficeManager officeManager = LocalOfficeManager.install(); 
try {

    // Start an office process and connect to the started instance (on port 2002).
    officeManager.start();

    final File inputFile = new File("document.docx");
    final File outputFile = new File("document.png");

    // Create a page selector filter in order to
    // convert only the first page.
    final PageSelectorFilter selectorFilter = new PageSelectorFilter(1);

    LocalConverter
      .builder()
      .filterChain(selectorFilter)
      .build()
      .convert(inputFile)
      .to(outputFile)
      .execute();
} finally {
    // Stop the office process
    LocalOfficeUtils.stopQuietly(officeManager);
}

As for your question

Can I set size of pdf to fit with converted document content

If you can do it using LibreOffice or Apache OpenOffice without JODConverter, then you can do it with JODConverter. You just have to find out how it can be done programmatically, and then create a filter to use with JODConverter.

I won't go in details here since you may choose another way but if you need further assistance, just ask on the Gitter Community of the project.

like image 151
sbraconnier Avatar answered Nov 17 '22 17:11

sbraconnier


You can try GroupDocs.Conversion Cloud SDK for Java, its free package plan provides 50 free credits per month. It supports conversion of all common file formats.

Sample DOCX to Image stream conversion code:

// Get App Key and App SID from https://dashboard.groupdocs.cloud/
ConvertApi apiInstance = new ConvertApi(AppSID,AppKey);
try {

    ConvertSettings settings = new ConvertSettings();

    settings.setStorageName(Utils.MYStorage);
    settings.setFilePath("conversions\\password-protected.docx");
    settings.setFormat("jpeg");

    DocxLoadOptions loadOptions = new DocxLoadOptions();
    loadOptions.setPassword("password");
    loadOptions.setHideWordTrackedChanges(true);
    loadOptions.setDefaultFont("Arial");

    settings.setLoadOptions(loadOptions);

    JpegConvertOptions convertOptions = new JpegConvertOptions();
    convertOptions.setFromPage(1);
    convertOptions.setPagesCount(1);
    convertOptions.setGrayscale(false);
    convertOptions.setHeight(1024);
    convertOptions.setQuality(100);
    convertOptions.setRotateAngle(90);
    convertOptions.setUsePdf(false);
    settings.setConvertOptions(convertOptions);

    // set OutputPath as empty will result the output as document IOStream
    settings.setOutputPath("");

    // convert to specified format
    File response = apiInstance.convertDocumentDownload(new ConvertDocumentRequest(settings));
    System.out.println("Document converted successfully: " + response.length());
} catch (ApiException e) {
    System.err.println("Exception while calling ConvertApi:");
    e.printStackTrace();
}

I am developer evangelist at Aspose.

like image 24
Tilal Ahmad Avatar answered Nov 17 '22 17:11

Tilal Ahmad