Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Converting HTML to Word Docx with style intact

Tags:

c#

openxml

I know there are already questions similar to this, and suggested Open XML and all.

I am using Open XMl but it work only with inline style.

is there any solution to this, or any other better way to convert html to docx other than Open XML.

Thanks!

like image 908
BreakHead Avatar asked Mar 23 '23 15:03

BreakHead


1 Answers

You can inline a CSS file using a tool like the one described here.

Then, to perform the conversion (adapted from Eric White's blog):

using (WordprocessingDocument myDoc =
    WordprocessingDocument.Open("ConvertedDocument.docx", true))
{
    string altChunkId = "AltChunkId1";
    MainDocumentPart mainPart = myDoc.MainDocumentPart;
    var chunk = mainPart.AddAlternativeFormatImportPart(
        AlternativeFormatImportPartType.Html, altChunkId);

    using (FileStream fileStream = File.Open("YourHtmlDocument.html", FileMode.Open))
    {
        chunk.FeedData(fileStream);
    }
    AltChunk altChunk = new AltChunk() {Id = altChunkId};

    mainPart.Document.Body.InsertAfter(
               altChunk, mainPart.Document.Body.Elements<Paragraph>().Last());
    mainPart.Document.Save();
}

This isn't exactly converting HTML to DOCX. It's appending YourHtmlDocument.html to ConvertedDocument.docx. If ConvertedDocument.docx is initially empty this approach is effectively a conversion.

Whenever you use an AltChunk to build a document, your HTML is embedded in the document until the next time the document is opened in Word. At that point, the HTML is converted to WordProcessingML markup. This is really only an issue if the document won't be opened in MS Word. If you were uploading to Google docs, opening in OpenOffice, or using COM to convert to a PDF, OpenXML won't be sufficient. In that case, you'll probably need to resort to a paid tool like Aspose.Words.

like image 134
Dan Garant Avatar answered Apr 06 '23 08:04

Dan Garant