C# Regex bug with URL

Question

I'm parsing a file of URL to get the host and URI part but there is a bug when the URL is not finished with a slash.

C# code :

var URL = Regex.Match(link, @"(?:.*?//)?(.*?)(/.*)", RegexOptions.IgnoreCase);

Input :

//cdn.sstatic.net/stackoverflow/img/favicon.ico
/opensearch.xml
http://stackoverflow.com/
http://careers.stackoverflow.com

Output :

//cdn.sstatic.net/stackoverflow/img/favicon.ico has 2 groups:
    cdn.sstatic.net
    /stackoverflow/img/favicon.ico

/opensearch.xml has 2 groups:

    /opensearch.xml

http://stackoverflow.com/ has 2 groups:
    stackoverflow.com
    /
http://careers.stackoverflow.com has 2 groups:
    http:
    //careers.stackoverflow.com

Every URL in the output is valid exept for : http://careers.stackoverflow.com. How can I check for a variable part like "if there is a slash, stop to the first one orelse grab everythings".

Samuel Neff · Accepted Answer

Add |$ to your last group, to match that text or match the end of the expression.

This works for your inputs:

var links = new[]
    {
        "//cdn.sstatic.net/stackoverflow/img/favicon.ico",
        "/opensearch.xml",
        "http://stackoverflow.com/",
        "http://careers.stackoverflow.com"
    };

foreach (string link in links)
{
    var u = Regex.Match(link, @"(?:.*?//)?(.*?)(/.*|$)", RegexOptions.IgnoreCase);
    Console.WriteLine(link);
    Console.WriteLine("    " + u.Groups[1]);
    Console.WriteLine("    " + u.Groups[2]);
    Console.WriteLine();
}

Output:

//cdn.sstatic.net/stackoverflow/img/favicon.ico
    cdn.sstatic.net
    /stackoverflow/img/favicon.ico

/opensearch.xml

    /opensearch.xml

http://stackoverflow.com/
    stackoverflow.com
    /

http://careers.stackoverflow.com
    careers.stackoverflow.com

C# Regex bug with URL

Tags:

c#

regex

url

parsing

Naster

1 Answers

Samuel Neff

Recent Activity

Donate For Us

C# Regex bug with URL

Tags:

c#

regex

url

parsing

Naster

1 Answers

Samuel Neff

Related questions

Recent Activity

Donate For Us