Is there anyway to get AngleSharp to not create a full HTML document when parsed a fragment. For example, if I parse:
<title>The Title</title>
I get a full HTML document in DocumentElement.OuterHtml
:
<html><head><title>The Title</title></head><body></body></html>
If I parse:
<p>The Paragraph</p>
I get another full HTML document:
<html><head></head><body><p>Hey</p></body></html>
Notice that AngleSharp is smart enough to know where my fragment should go. In one case, it puts it in the HEAD
tag, and in the other case, it puts it in the BODY
tag.
This is clever, but if I just want the fragment back out, I don't know where to get it. So, I can't just call Body.InnerHtml
because depending on the HTML I parsed, my fragment might be in the Head.InnerHtml
instead.
Is there a way to get AngleSharp to not create a full document, or is there some other way to get my isolated fragment back out after parsing?
It is possible now. Below is an example copied from https://github.com/AngleSharp/AngleSharp/issues/594
var fragment = "<script>deane</script><div>deane</div>";
var p = new HtmlParser();
var dom = p.Parse("<html><body></body></html>");
var nodes = p.ParseFragment(fragment, dom.Body);
The second parameter of ParseFragment
is used to specify the context in which the fragment is parsed. In your case you will need to parse the <title>
in the context of dom.Head
and the p
in dom.Body
.
Oh wow, it is OPs own code which I have just copied.
I have learned that this is not possible. AngleSharp is designed to generate a DOM exactly like the HTML spec says to do it. If you create an HTML document with the code I have above, open it in a browser, then inspect the DOM, you'll find the exact same situation. AngleSharp is in compliance.
What you can do is parse it as XML with errors suppressed, which should cause the document to self-correct dirty HTML issues, and give you a "clean" document which can then be manipulated.
var html = "<x><y><z>foo</y></z></x>";
var options = new XmlParserOptions()
{
IsSuppressingErrors = true
};
var dom = new XmlParser(options).Parse(html);
There is one problem in here, in that it doesn't handle entities perfectly (meaning it still throws some errors on these, even when supressed). It's on the list to be fixed.
Here's the GitHub issue that led to this answer:
https://github.com/AngleSharp/AngleSharp/issues/398
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With