Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Convert Word doc to HTML programmatically in Java

Tags:

I need to convert a Word document into HTML file(s) in Java. The function will take input an word document and the output will be html file(s) based on the number of pages the word document has i.e. if the word document has 3 pages then there will be 3 html files generated having the required page break.

I searched for open source/non-commercial APIs which can convert doc to html but for no result. Anybody who have done this type of job before please help.

Thanks

like image 514
kaychaks Avatar asked Oct 22 '08 19:10

kaychaks


People also ask

What is Java DOCX?

Docx4j is a Java library used for creating and manipulating Office OpenXML files – which means it can only work with the . docx file type, while older versions of Microsoft Word use a . doc extension (binary files). Note that the OpenXML format is supported by Microsoft Office starting with the 2007 version.


1 Answers

I recommend the JODConverter, It leverages OpenOffice.org, which provides arguably the best import/export filters for OpenDocument and Microsoft Office formats available today.

JODConverter has a lot of documents, scripts, and tutorials to help you out.

like image 83
Fisher Avatar answered Oct 10 '22 17:10

Fisher