Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Apache POI or docx4j for dealing with docx documents [closed]

Tags:

What do you think Which is better to use to read docx document as java objects and why ?

in other words. which library supports most of the word tags ?

like image 439
becks Avatar asked Feb 21 '13 22:02

becks


People also ask

What is doc4j?

Docx4j is a Java library used for creating and manipulating Office OpenXML files – which means it can only work with the . docx file type, while older versions of Microsoft Word use a . doc extension (binary files). Note that the OpenXML format is supported by Microsoft Office starting with the 2007 version.

What is Apache POI ooxml?

Apache POI provides Java API for manipulating various file formats based on the Office Open XML (OOXML) standard and OLE2 standard from Microsoft. Apache POI releases are available under the Apache License (V2. 0).


4 Answers

Disclosure: I lead the docx4j project

Although docx4j can also handle pptx and xlsx, it is mostly used for docx manipulation. By way of illustration, as at the time of writing, there are nearly 1000 topics in the docx4j forum. The pptx forum has only 10% of the volume.

Whatever you want to do with the docx document, docx4j ought to be able to help you. There's a single page overview of a generic workflow.

For many common requirements, docx4j provides higher level API. These include:

  • Create/open/save docx (of course)

  • Report/document generation, using a variety of approaches: (i) Variable substitution, (ii) XML data binding (particularly strong), and (iii) Mailmerge

  • Export as HTML, XHTML

  • Export as PDF (with font support)

For anything else, you can manipulate the JAXB representation of the docx to your heart's content. JAXB is a Java community standard, included in Java 6, and with a strong alternative implementation in EclipseLink's MOXy. (POI uses XML Beans instead of JAXB)

There's a web app to help you explore a docx, and generate Java code to create corresponding Java objects.

Of course, if there is some specific task you have in mind, it may be that docx4j or POI has a particular strength there.

Both docx4j and POI are ASL v2 licensed.

docx4j is actively maintained; its source code is on GitHub.

In addition, commercial support is available for docx4j if you want it, as are several commercial extensions eg MergeDocx.

docx4j does rely on POI as a library for its implementation of the OLE 2 Compound Document format, which we're grateful for.

like image 60
JasonPlutext Avatar answered Sep 28 '22 06:09

JasonPlutext


I think Apache POI 's main focus is on dealing with spreadsheets though i has features to read word documents and it uses xml beans to do so. Docx4j mainly deals with docx documents using jaxb. Usually jaxb allows xml to java object conversion hence i think docx4j would be preferable for your case.

like image 27
Mohamed Makthum Avatar answered Sep 28 '22 07:09

Mohamed Makthum


If you are dealing with docx document, docx4j is more convenient than Apache POI. You can use following links to learn basics of docx4j. Also, there is a nice forum of docx4j.

1.http://blog.iprofs.nl/2012/09/06/creating-word-documents-with-docx4j/ 2.http://www.smartjava.org/content/create-complex-word-docx-documents-programatically-docx4j?

like image 31
lycaenidae Avatar answered Sep 28 '22 07:09

lycaenidae


I tried Apache POI, but the problem is when printing anything from docx file (Ex: To print all "Heading1" elements from docx),it gets printed lots of bad data and whitespaces. Docx4j will avoid this bad data, I tried it.

like image 34
Venkatesh Dhanasekaran Avatar answered Sep 28 '22 05:09

Venkatesh Dhanasekaran