I am creating an HTML document using HTML agility pack. I load a template file then append content to it. All of this works, but when I view the output file it has removed the closing tag from my <br/>
tags to look like this <br>
. What is causing this?
Dim doc As New HtmlDocument()
doc.Load(Server.MapPath("Template.htm"))
Dim title As HtmlNode = doc.DocumentNode.SelectSingleNode("//title")
title.InnerHtml = title.InnerHtml & "CEU Classes"
Dim topContent As HtmlAgilityPack.HtmlNode = doc.GetElementbyId("topContent")
topContent.InnerHtml = html.ToString
doc.OptionWriteEmptyNodes = True
doc.Save(outputFileName, Encoding.UTF8)
More info:
It was removing my closing image tags, after I added doc.OptionWriteEmptyNodes = True
, it quite doing that.
Update
This is my code as it stands now that removes the closing BR tag
Dim html As String = "Words<br/>more words"
Dim doc As New HtmlDocument()
Dim title As HtmlNode
Dim topContent As HtmlNode
HtmlNode.ElementsFlags("br") = HtmlElementFlag.Empty
doc.Load(Server.MapPath("Template.htm"))
Title = doc.DocumentNode.SelectSingleNode("//title")
title.InnerHtml = title.InnerHtml & "CEU Classes"
topContent = doc.GetElementbyId("topContent")
topContent.InnerHtml = html.ToString
doc.OptionWriteEmptyNodes = True
doc.Save(outputFileName, Encoding.UTF8)
Update 2
I ended up just reading in my template file as a standard string then loading the html like this
Dim TemplateHTML As String = File.ReadAllText(Server.MapPath("Template.htm"))
TemplateHTML = TemplateHTML.Insert(TemplateHTML.IndexOf("<div id=""topContent"">") + "<div id=""topContent"">".Length, _
html.ToString)
doc.LoadHtml(TemplateHTML)
It happens because the Html Agility Pack handles the BR in a special way. It still supports old (but existing on the web today) HTML 3.2 syntax where the BR could be declared without a closing tag at all (browsers also still handle it gracefully by the way...).
To change this default behavior, you need to modify the HtmlNode.ElementFlags
property, like this:
Dim doc As New HtmlDocument()
HtmlNode.ElementsFlags("br") = HtmlElementFlag.Empty
doc.LoadHtml("<test>before<br/>after</test>")
doc.OptionWriteEmptyNodes = True
doc.Save(Console.Out)
which will display:
<test>before<br />after</test>
As per @Simon Mourier, the following C# code works in version 1.4
var doc = new HtmlDocument();
HtmlNode.ElementsFlags["br"] = HtmlElementFlag.Empty;
doc.OptionWriteEmptyNodes = true;
doc.LoadHtml("Lorem ipsum dolor sit<br/>Lorem ipsum dolor sit");
var postParsed = doc.DocumentNode.WriteTo();
has the following string value for postParsed
"Lorem ipsum dolor sit<br />Lorem ipsum dolor sit"
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With