I'm using AngleSharp to parse HTML5 at the moment what I'm doing is wrapping the elements I want to parse with a little bit of HTML to make it a valid HTML5 and then use the parser on that, is there a better of doing it? meaning, parsing specific elements directly and validate that the structure is indeed HTML5?
Hm, a little example would be nice. But AngleSharp does support fragment parsing, which sounds like the thing you want. In general fragment parsing is also applied when you set properties like InnerHtml
, which transform strings to DOM nodes.
You can use the ParseFragment
method of the HtmlParser
class to get a list of nodes contained in the given source code. An example:
using AngleSharp.Parser.Html;
// ...
var source = "<div><span class=emphasized>Works!</span></div>";
var parser = new HtmlParser();
var nodes = parser.ParseFragment(source, null);//null = no context given
if (nodes.Length == 0)
Debug.WriteLine("Apparently something bad happened...");
foreach (var node in nodes)
{
// Examine the node
}
Usually all nodes will be IText
or IElement
types. Also comments (IComment
) are possible. You will never see IDocument
or IDocumentFragment
nodes attached to such an INodeList
. However, since HTML5 is quite robust it is very likely that you will never experience "errors" using this method.
What you can do is to look for (parsing) errors. You need to provide an IConfiguration
that exposes an event aggregator, which collects such events. The simplest implementation for aggregating only such events (without possibility of adding / removing multiple handlers) is the following:
using AngleSharp.Events;
// ...
class SimpleEventAggregator : IEventAggregator
{
readonly List<HtmlParseErrorEvent> _errors = new List<HtmlParseErrorEvent>();
public void Publish<TEvent>(TEvent data)
{
var error = data as HtmlParseErrorEvent;
if (error != null)
_errors.Add(error);
}
public List<HtmlParseErrorEvent> Errors
{
get { return _errors; }
}
public void Subscribe<TEvent>(ISubscriber<TEvent> listener) { }
public void Unsubscribe<TEvent>(ISubscriber<TEvent> listener) { }
}
The simplest way to use the event aggregator with a configuration is to instantiate a new (provided) Configuration
. Here as a sample snippet.
using AngleSharp;
// ...
var errorEvents = new SimpleEventAggregator();
var config = new Configuration(events: errorEvents);
Please note: Every error that is reported is an "official" error (according to W3C spec.). These errors do not indicate that the provided code is malicious or invalid, just that something is not following the spec and that a fallback had to be applied.
Hope this answers your question. If not, then please let me know.
Update Updated the answer for the latest version of AngleSharp.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With