Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to convert docx to html file using open xml with formatting

Tags:

html

c#

openxml

I know there are lot of question having same title but I am currently having some issue for them I didn't get the correct way to go.

I am using Open xml sdk 2.5 along with Power tool to convert .docx file to .html file which uses HtmlConverter class for conversion.

I am successfully able to convert the docx file into the Html file but the problem is, html file doesn't retain the original formatting of the document file. eg. Font-size,color,underline,bold etc doesn't reflect into the html file.

Here is my existing code:

public void ConvertDocxToHtml(string fileName)
{
   byte[] byteArray = File.ReadAllBytes(fileName);
   using (MemoryStream memoryStream = new MemoryStream())
   {
      memoryStream.Write(byteArray, 0, byteArray.Length);
      using (WordprocessingDocument doc = WordprocessingDocument.Open(memoryStream, true))
      {
         HtmlConverterSettings settings = new HtmlConverterSettings()
         {
            PageTitle = "My Page Title"
         };
         XElement html = HtmlConverter.ConvertToHtml(doc, settings);
         File.WriteAllText(@"E:\Test.html", html.ToStringNewLineOnAttributes());
      }
    }
 }

So I just want to know if is there any way by which I can retain the formatting in converted HTML file.

I know about some third party APIs which does the same thing. But I would prefer if there any way using open xml or any other open source to do this.

like image 393
Sachin Avatar asked Dec 23 '13 19:12

Sachin


People also ask

How do I convert a DOCX file to HTML?

1-Click on extension icon, this will open our online converter utility. 2-Drag Drop or Browse docx files, enter your email address and click convert button. Your docx file will be converted to html file.

What is HTML DOCX?

DOCX is an XML based word processing file developed by Microsoft. DOCX files are different than DOC files as DOCX files store data in separate compressed files and folders.


1 Answers

PowerTools for Open XML just released a new HtmlConverter module. It now contains an open source, free implementation of a conversion from DOCX to HTML formatted with CSS. The module HtmlConverter.cs supports all paragraph, character, and table styles, fonts and text formatting, numbered and bulleted lists, images, and more. See http://bit.ly/1bclyg9

like image 149
Eric White Avatar answered Nov 14 '22 23:11

Eric White