Has anyone integrated BeautifulSoup with ASP.NET/C# (possibly using IronPython or otherwise)? Is there a BeautifulSoup alternative or a port that works nicely with ASP.NET/C#
The intent of planning to use the library is to extract readable text from any random URL.
Thanks
Html Agility Pack is a similar project, but for C# and .NET
EDIT:
To extract all readable text:
document.DocumentNode.InnerText
Note that this will return the text content of <script>
tags.
To fix that, you can remove all of the <script>
tags, like this:
foreach(var script in doc.DocumentNode.Descendants("script").ToArray())
script.Remove();
foreach(var style in doc.DocumentNode.Descendants("style").ToArray())
style.Remove();
(Credit: SLaks)
You could try this although it currently has a few bugs:
http://nsoup.codeplex.com/
I know this is quite old, but I decided to post this for future reference. I came across this searching for a similar solution.
I found a library built on top of Html Agility Pack called scrapysharp
I've used it in quite similar manner as I would BeautifulSoup https://bitbucket.org/rflechner/scrapysharp/wiki/Home (EDIT: broken link, project moved to https://github.com/rflechner/ScrapySharp)
EDIT: https://www.nuget.org/packages/ScrapySharp/ has the package
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With