Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Automatic HTML simplifier tool?

Tags:

html

Whenever I see a problem that would be shared by others, with a solution that would be fun to implement, it usually turns out to have been solved already. I think it's best to stop myself and do a search before I dive into the coding.

Here's the situation: You can copy and paste sections of an office document into the visual studio HTML editor. The problem is, it creates HTML text that looks like this:

<tr style="mso-yfti-irow:0;mso-yfti-firstrow:yes">
                <td style="border:solid windowtext 1.0pt;mso-border-alt:solid windowtext .5pt;
   padding:0cm 5.4pt 0cm 5.4pt" valign="top">
                    <p align="left" class="MsoNormal" 
                        style="text-align:left;tab-stops:center 216.0pt right 432.0pt">
                        <b style="mso-bidi-font-weight:normal"><span lang="EN-US">ID<o:p></o:p></span></b></p>
                </td>
                <td style="border:solid windowtext 1.0pt;border-left:none;
   mso-border-left-alt:solid windowtext .5pt;mso-border-alt:solid windowtext .5pt;
   padding:0cm 5.4pt 0cm 5.4pt" valign="top">

Fine for a machine, but this is not really human-readable. I bet this could be cleaned up by finding the repeating styles and creating CSS classes out of them. A computer program could do that really easy.

I could run this program, and then I would have nice-looking, easy to maintain HTML that looks just like my Word document.

(Yes, I know I can just edit my Word document and then copy-and-paste it into HTML, or just save it as an HTML file. But it just wouldn't be the same as hand-editing it after the fact).

Anyway, does anyone know of a program that does this?


(later edit) I discovered the question I asked is a duplicate of this one.
like image 548
Andrew Shepherd Avatar asked May 08 '09 06:05

Andrew Shepherd


2 Answers

HTML Tidy does this! It also integrates with common text editors (such as Notepad++ or UltraEdit) and provides the option to clean up Office web markup. You will need to set the word-2000 boolean flag to true

Additionally, Jeff Atwood has blogged about this problem and presented his own C# 2.0 solution in this article.

like image 196
Cerebrus Avatar answered Nov 03 '22 19:11

Cerebrus


I would try using HTML Tidy: http://tidy.sourceforge.net/ , another option is pasting your word document into TinyMCE and then saving your HTML.

like image 30
Michal Rogozinski Avatar answered Nov 03 '22 19:11

Michal Rogozinski