I just downloaded the HTMLAgilityPack and the documentation doesn't have any examples.
I'm looking for a way to download all the images from a website. The address strings, not the physical image.
<img src="blabalbalbal.jpeg" />
I need to pull the source of each img tag. I just want to get a feel for the library and what it can offer. Everyone said this was the best tool for the job.
Edit
public void GetAllImages() { WebClient x = new WebClient(); string source = x.DownloadString(@"http://www.google.com"); HtmlAgilityPack.HtmlDocument document = new HtmlAgilityPack.HtmlDocument(); document.Load(source); //I can't use the Descendants method. It doesn't appear. var ImageURLS = document.desc .Select(e => e.GetAttributeValue("src", null)) .Where(s => !String.IsNullOrEmpty(s)); }
For users who are unafamiliar with “HTML Agility Pack“, this is an agile HTML parser that builds a read/write DOM and supports plain XPATH or XSLT. In simple words, it is a . NET code library that allows you to parse “out of the web” files (be it HTML, PHP or aspx).
Html Agility Pack (HAP) is a free and open-source HTML parser written in C# to read/write DOM and supports plain XPATH or XSLT. It is a . NET code library that allows you to parse "out of the web" HTML files.
You can do this using LINQ, like this:
var document = new HtmlWeb().Load(url); var urls = document.DocumentNode.Descendants("img") .Select(e => e.GetAttributeValue("src", null)) .Where(s => !String.IsNullOrEmpty(s));
EDIT: This code now actually works; I had forgotten to write document.DocumentNode
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With