Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I use HTML Agility Pack to retrieve all the images from a website?

Tags:

I just downloaded the HTMLAgilityPack and the documentation doesn't have any examples.

I'm looking for a way to download all the images from a website. The address strings, not the physical image.

<img src="blabalbalbal.jpeg" /> 

I need to pull the source of each img tag. I just want to get a feel for the library and what it can offer. Everyone said this was the best tool for the job.

Edit

public void GetAllImages()     {         WebClient x = new WebClient();         string source = x.DownloadString(@"http://www.google.com");          HtmlAgilityPack.HtmlDocument document = new HtmlAgilityPack.HtmlDocument();         document.Load(source);                           //I can't use the Descendants method. It doesn't appear.         var ImageURLS = document.desc                    .Select(e => e.GetAttributeValue("src", null))                    .Where(s => !String.IsNullOrEmpty(s));             } 
like image 286
Sergio Tapia Avatar asked Jan 21 '10 23:01

Sergio Tapia


People also ask

What is HTML agility pack?

For users who are unafamiliar with “HTML Agility Pack“, this is an agile HTML parser that builds a read/write DOM and supports plain XPATH or XSLT. In simple words, it is a . NET code library that allows you to parse “out of the web” files (be it HTML, PHP or aspx).

Is HTML agility pack free?

Html Agility Pack (HAP) is a free and open-source HTML parser written in C# to read/write DOM and supports plain XPATH or XSLT. It is a . NET code library that allows you to parse "out of the web" HTML files.


1 Answers

You can do this using LINQ, like this:

var document = new HtmlWeb().Load(url); var urls = document.DocumentNode.Descendants("img")                                 .Select(e => e.GetAttributeValue("src", null))                                 .Where(s => !String.IsNullOrEmpty(s)); 

EDIT: This code now actually works; I had forgotten to write document.DocumentNode.

like image 92
SLaks Avatar answered Sep 19 '22 15:09

SLaks