Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

BeautifulSoup similar for C# [closed]

is there any similar library to BeautifulSoup for C#?

I want to simply parse HTMLs and XMLs, specially HTMLs with errors.

like image 516
Juan Carlos Avatar asked Nov 30 '12 19:11

Juan Carlos


People also ask

Is lxml faster than BeautifulSoup?

It is not uncommon that lxml/libxml2 parses and fixes broken HTML better, but BeautifulSoup has superiour support for encoding detection. It very much depends on the input which parser works better. In the end they are saying, The downside of using this parser is that it is much slower than the HTML parser of lxml.

Is BeautifulSoup a Python module?

Beautiful Soup is a Python library for pulling data out of HTML and XML files. It works with your favorite parser to provide idiomatic ways of navigating, searching, and modifying the parse tree. It commonly saves programmers hours or days of work.

Is BeautifulSoup a parser?

Beautiful Soup is a Python package for parsing HTML and XML documents (including having malformed markup, i.e. non-closed tags, so named after tag soup). It creates a parse tree for parsed pages that can be used to extract data from HTML, which is useful for web scraping.

Why is it called BeautifulSoup?

It's BeautifulSoup, and is named after so-called 'tag soup', which refers to "syntactically or structurally incorrect HTML written for a web page", from the Wikipedia definition. jsoup is the Java version of Beautiful Soup.


1 Answers

I have used HTMLAgilityPack in the past with some success but it had some issues with parsing HTML that is badly formed or missing closing tags. However that was about 2 years ago.

I have usually tended toward the SGMLReader which allows you to wrap it with a XML Reader and so you can then easily use XDocument or XmlDocument in C# to read the HTML. The SGMLReader has worked on all malformed HTML that I have thrown at it.

like image 172
Adam Gritt Avatar answered Sep 17 '22 18:09

Adam Gritt