Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

convert docx with (ordered) list to html

Tags:

html

docx

I'm trying to convert a large docx document with several layers' ordered list to an html. (see an example of the document here: http://docdro.id/X1oyfBv You should download it)

I tried the following things, including:

  • online converters such as html-cleaner and index.html (which only recognize one layer of the list)

  • save as html - which creates an horrendous file but still doesn't recognize the ol structure.

  • saved the file as zip and then opened the xml file, but I dont see an easy way to get the ol structure out of the w:... tags

  • saving it to google docs and running Omar Alzabir's script http://omaralzabir.com/wp-content/uploads/2014/05/GoogleDocsEmail.jpg

btw. If I create a word file with an ordered list with multiple layers and i convert it, it does recognize it as ol's. But the existing file is not recognized as ol's even if I 'un-list' and list it again. So possibly there is something wrong with how the original document was created (?)

Any suggestions much appreciated:) Or indications as to why this problem occurs

like image 772
Cabuy Avatar asked Mar 04 '26 16:03

Cabuy


2 Answers

Are you asking how to save a Word-doc in HTML format, with multi-level ordered-lists?

Word-HTML has bugs in its multi-level ordered lists. For the list-items, the indentation tends to be incorrect and inconsistent. There's an example here.

Word-HTML has similar bugs in its multi-level unordered lists. An example is here.

I recently wrote a Python program that fixes these bugs, in Word's HTML. The program is part of WordWebNav (WWN), which is free and open-source.

WWN is an app that converts a Microsoft-Word document to a usable web-page. It adds some missing features in the Word-HTML web-page (e.g., a navigation pane), and it fixes bugs in the Word-HTML.

like image 133
JimYuill Avatar answered Mar 07 '26 05:03

JimYuill


You can use pandoc : https://github.com/jgm/pandoc

This is an open source universal command line tool to convert markup source based document files.

You can use it as something like that:

  pandoc -o output.html input.docx
like image 42
Rémi Becheras Avatar answered Mar 07 '26 04:03

Rémi Becheras



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!