I am new to c# and I really need help with the following problem. I wish to extract the photos urls from a webpage that have a specific pattern. For example I wish to extract all the images that have the following pattern name_412s.jpg. I use the following code to extract images from html, but I do not kow how to adapt it.
public void Images()
{
WebClient x = new WebClient();
string source = x.DownloadString(@"http://www.google.com");
HtmlAgilityPack.HtmlDocument document = new HtmlAgilityPack.HtmlDocument();
document.Load(source);
foreach(HtmlNode link in document.DocumentElement.SelectNodes("//img")
{
images[] = link["src"];
}
}
I also need to write the results in a xml file. Can you also help me with that?
Thank you !
To limit the query results, you need to add a condition to your XPath. For instance, //img[contains(@src, 'name_412s.jpg')]
will limit the results to only img
elements that have an src
attribute that contains that file name.
As far as writing out the results to XML, you'll need to create a new XML document and then copy the matching elements into it. Since you won't be able to directly import an HtmlAgilityPack node into an XmlDocument, you'll have to manually copy all the attributes. For instance:
using System.Net;
using System.Xml;
// ...
public void Images()
{
WebClient x = new WebClient();
string source = x.DownloadString(@"http://www.google.com");
HtmlAgilityPack.HtmlDocument document = new HtmlAgilityPack.HtmlDocument();
document.Load(source);
XmlDocument output = new XmlDocument();
XmlElement imgElements = output.CreateElement("ImgElements");
output.AppendChild(imgElements);
foreach(HtmlNode link in document.DocumentElement.SelectNodes("//img[contains(@src, '_412s.jpg')]")
{
XmlElement img = output.CreateElement(link.Name);
foreach(HtmlAttribute a in link.Attributes)
{
img.SetAttribute(a.Name, a.Value)
}
imgElements.AppendChild(img);
}
output.Save(@"C:\test.xml");
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With