Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

BeautifulSoup and ASP.NET/C#

Has anyone integrated BeautifulSoup with ASP.NET/C# (possibly using IronPython or otherwise)? Is there a BeautifulSoup alternative or a port that works nicely with ASP.NET/C#

The intent of planning to use the library is to extract readable text from any random URL.

Thanks

like image 429
user300981 Avatar asked Jul 28 '10 20:07

user300981


3 Answers

Html Agility Pack is a similar project, but for C# and .NET


EDIT:

To extract all readable text:

document.DocumentNode.InnerText

Note that this will return the text content of <script> tags.

To fix that, you can remove all of the <script> tags, like this:

foreach(var script in doc.DocumentNode.Descendants("script").ToArray())
    script.Remove();
foreach(var style in doc.DocumentNode.Descendants("style").ToArray())
    style.Remove();

(Credit: SLaks)

like image 184
Colin Pickard Avatar answered Oct 16 '22 07:10

Colin Pickard


You could try this although it currently has a few bugs:

http://nsoup.codeplex.com/

like image 38
Adam Avatar answered Oct 16 '22 05:10

Adam


I know this is quite old, but I decided to post this for future reference. I came across this searching for a similar solution.

I found a library built on top of Html Agility Pack called scrapysharp

I've used it in quite similar manner as I would BeautifulSoup https://bitbucket.org/rflechner/scrapysharp/wiki/Home (EDIT: broken link, project moved to https://github.com/rflechner/ScrapySharp)

EDIT: https://www.nuget.org/packages/ScrapySharp/ has the package

like image 28
Yavor Shahpasov Avatar answered Oct 16 '22 06:10

Yavor Shahpasov