I'm looking for a regular expression to isolate the src value of an img. (I know that this is not the best way to do this but this is what I have to do in this case)
I have a string which contains simple html code, some text and an image. I need to get the value of the src attribute from that string. I have managed only to isolate the whole tag till now.
string matchString = Regex.Match(original_text, @"(<img([^>]+)>)").Value;
string matchString = Regex.Match(original_text, "<img.+?src=[\"'](.+?)[\"'].*?>", RegexOptions.IgnoreCase).Groups[1].Value;
I know you say you have to use regex, but if possible i would really give this open source project a chance: HtmlAgilityPack
It is really easy to use, I just discovered it and it helped me out a lot, since I was doing some heavier html parsing. It basically lets you use XPATHS to get your elements.
Their example page is a little outdated, but the API is really easy to understand, and if you are a little bit familiar with xpaths you will get head around it in now time
The code for your query would look something like this: (uncompiled code)
List<string> imgScrs = new List<string>();
HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(htmlText);//or doc.Load(htmlFileStream)
var nodes = doc.DocumentNode.SelectNodes(@"//img[@src]"); s
foreach (var img in nodes)
{
HtmlAttribute att = img["src"];
imgScrs.Add(att.Value)
}
I tried what Francisco Noriega suggested, but it looks that the api to the HtmlAgilityPack has been altered. Here is how I solved it:
List<string> images = new List<string>();
WebClient client = new WebClient();
string site = "http://www.mysite.com";
var htmlText = client.DownloadString(site);
var htmlDoc = new HtmlDocument()
{
OptionFixNestedTags = true,
OptionAutoCloseOnEnd = true
};
htmlDoc.LoadHtml(htmlText);
foreach (HtmlNode img in htmlDoc.DocumentNode.SelectNodes("//img"))
{
HtmlAttribute att = img.Attributes["src"];
images.Add(att.Value);
}
This should capture all img tags and just the src part no matter where its located (before or after class etc) and supports html/xhtml :D
<img.+?src="(.+?)".+?/?>
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With