Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Convert MS Word texts to plain valid html/css

I'm looking for a way to convert few paragraphs and ordered/unordered lists from a MS Word file to HTML.

Now, the problem is that when saving the Word file as a "htm/html" type of file (I'm using Word 2010), I get tons of all kinds of unwanted CSS directives, some are MS-invented and some are valid CSS, that I don't want in my html code. Moreover, and even more problematic, the ordered/unordered lists not even encoded to OL and UL with LI items, rather to a crazy Microsofty encoding.

For example, a paragraph (Styled as "Normal" in Word) is converted to:

<p class=MsoNormal>
 <span style='font-size:10.0pt;line-height:115%;mso-bidi-font-style:italic'>
  bla bla </span></p>

And I just want it to plainly be:

<p><span>bla bla</span></p>  

More horrific, a simple unoredered list ("bulleted list") with one list item with is converted to:

<p class=MsoListParagraph style='text-indent:-18.0pt;mso-list:l0 level1 lfo1'>
 <![if !supportLists]>
  <span style='font-family:Symbol;mso-fareast-font-family:Symbol;mso-bidi-font-family:Symbol'>
   <span style='mso-list:Ignore'>·
    <span style='font:7.0pt "Times New Roman"'>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;

    </span></span></span><![endif]>
 <span dir=LTR</span>Bla bla</p>

While I wish to get:

<ul><li>Bla bla</li></ul>

Any ideas?

Thanks so much!

p.s. I'm using Zend Studio (maybe there's a built in eclipse/zend-specific converter or something?)
p.s.p. The only MS Word options for exporting as html I've found are in Options => Advanced => General => Web Options. Playing with these options didn't solve any of the above problems.

like image 352
Israel Avatar asked Jul 24 '13 20:07

Israel


1 Answers

Ok, found a bizarre but working solution:

Use http://htmleditor.in/index.html and the "Paste from Word" option, BUT do this using (Ironically!) Internet Explorer (Tested with IE 9).

The reason is, when I used Chrome for the job, upon pressing "Paste from Word", an html div-type pop up came asking my permission to directly access my clipboard data, and when pasting there using ctrl-v the text, as required, the result was lacking the bullets (the bulleted items were converted to paragraphs).

On the contrary, when I used IE 9, instead of the div-type pop up, I get a IE system-type pop up, and pasting there keeps the bullets...

The irony here is that to solve a problem that started with Microsoft, I used another Microsoft product, where probably because of its poor html compatibility, did exactly what i wanted... lol.

like image 71
Israel Avatar answered Sep 30 '22 11:09

Israel