Why is this regex being greedy?

Question

I am trying to extract all links that have /thumb/ in it within ""'s. Actually i only need to use the images src. I dont know if images will end with jpg or if there will be case sensitivity problems, etc. I really only care about the full link.

m = Regex.Match(page, @"""(.+?/thumbs/.+?)""");
//...
var thumbUrl = m.Groups[1].Value;

My full code

    var page = DownloadWebPage(url);
    var reg = new Regex(@"Elements\s+$(.*)$", RegexOptions.Multiline);
    var m = reg.Match(page);
    var szEleCount= m.Groups[1].Value;
    int eleCount = int.Parse(szEleCount);

    m = Regex.Match(page, @"""(.+?/thumbs/.+?)""");
    while (m.Success)
    {
        var thumbUrl = m.Groups[1].Value;
        //i break here to see a problem
        m = m.NextMatch();
    }

thumbUrl looks like

center\"> ... lot of text, no /thumbs/ ... src=\"http://images.fdhkdhfkd.com/thumbs/dfljdkl/22350.jpg

Andomar · Accepted Answer

Nongreedy regular expressions can be slow because the engine has to do a lot of backtracking.

This one uses only greedy expressions:

@"""([^""]*/thumbs/[^""]*)"""

Instead of matching the least amount of anything, it matches as many non-double-quotes as it can.

Why is this regex being greedy?

Tags:

c#

regex

non-greedy

1 Answers

Andomar

Recent Activity

Donate For Us

Why is this regex being greedy?

Tags:

c#

regex

non-greedy

1 Answers

Andomar

Related questions

Recent Activity

Donate For Us