I am using HTML Agility Pack to select an element and return that element and everything it contains from an html string that is loaded. In testing my code, I ran it against the select tag example from w3schools:
<select name="cars">
<option value="volvo">Volvo XC90</option>
<option value="saab">Saab 95</option>
<option value="mercedes">Mercedes SLK</option>
<option value="audi">Audi TT</option>
</select>
When I try to select and return this with HTML agility pack, I get (option closing tags removed):
<select name="cars">
<option value="volvo">Volvo XC90
<option value="saab">Saab 95
<option value="mercedes">Mercedes SLK
<option value="audi">Audi TT
</select>
So I did some searching here and found an instruction to add the line: HtmlNode.ElementsFlags.Remove("option");
I did that, and now I get (the options text is moved outside of the option tags):
<select name="cars">
<option value="volvo"></option>Volvo XC90
<option value="saab"></option>Saab 95
<option value="mercedes"></option>Mercedes SLK
<option value="audi"></option>Audi TT
</select>
I would like the output to match the original HTML. What do I need to do to get that?
I was also playing with the OptionWriteEmptyNodes as when I tested with input tags their self closing was being removed, adding that option seemed to fix that. I commented it out now to make sure it wasn't impacting this issue.
This is my .NET C# code:
HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(content);
HtmlNode.ElementsFlags.Remove("option"); // otherwise, the closing tag is removed.
//doc.OptionWriteEmptyNodes = true;
var nodes = doc.DocumentNode.SelectNodes("//select");
if (nodes == null)
return "Not found";
else
return nodes[0].OuterHtml;
You need to set the ElementsFlag field for the option tag to make it work
HtmlNode.ElementsFlags["option"] = HtmlElementFlag.Closed;
HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(html);
which should return your original HTML code.
I believe the reason that HtmlAgilityPack behaves this way is because the <option>
-tag is ironically an optional tag in HTML that doesn't require a closing tag.
Taken from the documentation of the HtmlNode
class and it's field ElementsFlags
:
Gets a collection of flags that define specific behaviors for specific element nodes. The table contains a DictionaryEntry list with the lowercase tag name as the Key, and a combination of HtmlElementFlags as the Value.
Further look into the HtmlElementFlag
enums reveal this:
Empty - The node is empty. META or IMG are example of such nodes. Closed - The node will automatically be closed during parsing.
You can view the source code for the class HtmlNode to see what other tags are considered 'specific'.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With