Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Losing superscript tag when converting HTML to DOCX using libreoffice

I have the following HTML:

<html><body><p>n<sup>th</sup></p></body></html>

I am using the command:

$ libreoffice --convert-to docx:"MS Word 2007 XML" test.html

To convert that HTML into a DOCX file. However I notice that the resulting DOCX file does not actually contain the <sup> tag. It looks like it is using position and size to replicate the <w:vertAlign> tag:

<w:position w:val="8"/><w:sz w:val="19"/>

What I would need to know is how to make libreoffice put in the <w:vertAlign> tag instead of using position and size.

Additonal Info:

I had a similar problem with bold and italics (<strong><em>) but was able to get the conversion to work correctly if I converted the strong and em tags to b and i tags respectively.

like image 684
Jason Ward Avatar asked May 22 '14 21:05

Jason Ward


3 Answers

If you are looking to edit the HTML, it would be much better to use a tool that is suited for editing HTML, such as Notepad++ or Sublime (as examples).

If you need to have the HTML as a LibreOffice document for a specific reason, you could open the HTML file in Notepad and save as a text file with .txt as the extension. That should allow you to open the document in LibreOffice.

like image 80
Patricia Green Avatar answered Nov 18 '22 11:11

Patricia Green


You can try using a WYSIWYG(What You See Is What You Get) editor like TinyMCE(http://www.tinymce.com/). There are lots of them online and you can also find some desktop applications for that. but if you want to convert it in docx you can try this http://htmltodocx.codeplex.com/ it is written in php and uses PHPWord and is quite efficient.

like image 39
kk3nny Avatar answered Nov 18 '22 11:11

kk3nny


Just create a Python script that replaces your unwanted tags with the <w:vertAlign> tag where ever needed.

like image 1
Vivek Avatar answered Nov 18 '22 11:11

Vivek