I have the following HTML:
<html><body><p>n<sup>th</sup></p></body></html>
I am using the command:
$ libreoffice --convert-to docx:"MS Word 2007 XML" test.html
To convert that HTML into a DOCX file. However I notice that the resulting DOCX file does not actually contain the <sup>
tag. It looks like it is using position and size to replicate the <w:vertAlign>
tag:
<w:position w:val="8"/><w:sz w:val="19"/>
What I would need to know is how to make libreoffice put in the <w:vertAlign>
tag instead of using position and size.
Additonal Info:
I had a similar problem with bold and italics (<strong><em>
) but was able to get the conversion to work correctly if I converted the strong
and em
tags to b
and i
tags respectively.
If you are looking to edit the HTML, it would be much better to use a tool that is suited for editing HTML, such as Notepad++ or Sublime (as examples).
If you need to have the HTML as a LibreOffice document for a specific reason, you could open the HTML file in Notepad and save as a text file with .txt as the extension. That should allow you to open the document in LibreOffice.
You can try using a WYSIWYG(What You See Is What You Get) editor like TinyMCE(http://www.tinymce.com/). There are lots of them online and you can also find some desktop applications for that. but if you want to convert it in docx you can try this http://htmltodocx.codeplex.com/ it is written in php and uses PHPWord and is quite efficient.
Just create a Python script that replaces your unwanted tags with the <w:vertAlign>
tag where ever needed.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With