Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

HtmlAgilityPack: how to create indented HTML?

So, I am generating html using HtmlAgilityPack and it's working perfectly, but html text is not indented. I can get indented XML however, but I need HTML. Is there a way?

HtmlDocument doc = new HtmlDocument();

// gen html
HtmlNode table = doc.CreateElement("table");
table.Attributes.Add("class", "tableClass");
HtmlNode tr = doc.CreateElement("tr");
table.ChildNodes.Append(tr);
HtmlNode td = doc.CreateElement("td");
td.InnerHtml = "—";
tr.ChildNodes.Append(td);

// write text, no indent :(
using(StreamWriter sw = new StreamWriter("table.html"))
{
        table.WriteTo(sw);
}

// write xml, nicely indented but it's XML!
XmlWriterSettings settings = new XmlWriterSettings();
settings.OmitXmlDeclaration = true;
settings.Indent = true;
settings.ConformanceLevel = ConformanceLevel.Fragment;
using (XmlWriter xw = XmlTextWriter.Create("table.xml", settings))
{
        table.WriteTo(xw);
}
like image 338
Petr Abdulin Avatar asked May 09 '11 12:05

Petr Abdulin


3 Answers

Fast, Reliable, Pure C#, .NET Core compatible AngleSharp

You can parse it with AngleSharp which provides a way to auto indent:

var parser = new HtmlParser();
var document = parser.ParseDocument(text);
using (var writer = new StringWriter())
{
    document.ToHtml(writer, new PrettyMarkupFormatter
                            {
                                Indentation = "\t",
                                NewLine = "\n"
                            });
    var indentedText = writer.ToString();
}
like image 104
Fab Avatar answered Nov 06 '22 15:11

Fab


As far as I know, HtmlAgilityPack cannot do this. But you could look through html tidy packs which are proposed in similar questions:

like image 29
Oleks Avatar answered Nov 06 '22 13:11

Oleks


No, and it's a "by design" choice. There is a big difference between XML (or XHTML, which is XML, not HTML) where - most of the times - whitespaces are no specific meaning, and HTML.

This is not a so minor improvement, as changing whitespaces can change the way some browsers render a given HTML chunk, especially malformed HTML (that is in general well handled by the library). And the Html Agility Pack was designed to keep the way the HTML is rendered, not to minimize the way the markup is written.

I'm not saying it's not feasible or plain impossible. Obviously you can convert to XML and voilà (and you could write an extension method to make this easier) but the rendered output may be different, in the general case.

like image 6
Simon Mourier Avatar answered Nov 06 '22 13:11

Simon Mourier