HTML Parsing Libraries for .NET [closed]

Question

I'm looking for libraries to parse HTML to extract links, forms, tags etc.

http://www.majestic12.co.uk/projects/html_parser.php
http://www.netomatix.com/Products/DocumentManagement/HtmlParserNet.aspx
http://www.developer.com/net/csharp/article.php/2230091

LGPL or any other commercial development friendly licenses are preferable.

Have you got any experience with one of this libraries? Or could you recommend another similar library?

Marc Gravell · Accepted Answer

The HTML Agility Pack has examples of exactly this type of thing, and uses xpath for familiar queries - for example (from home page), to find all links is simply:

foreach(HtmlNode link in doc.DocumentElement.SelectNodes("//a@href")) {
    //...
}

EDIT

As of 6/19/2012, the code above, as well as the only code sample shown on HTML Agility Pack Examples page won't work. Just needs slight tweaking as shown below.

HtmlDocument doc = new HtmlDocument();
doc.Load("file.htm");

foreach (HtmlNode link in doc.DocumentNode.SelectNodes("//a[@href]"))
{
  HtmlAttribute att = link.Attributes["href"];
  att.Value = Foo(att); // fix the link
}
doc.Save("file.htm");

HTML Parsing Libraries for .NET [closed]

Tags:

html

dom

.net

parsing

dr. evil

1 Answers

Marc Gravell

Recent Activity

Donate For Us

HTML Parsing Libraries for .NET [closed]

Tags:

html

dom

.net

parsing

dr. evil

1 Answers

Marc Gravell

Related questions

Recent Activity

Donate For Us