I'm parsing a file of URL to get the host and URI part but there is a bug when the URL is not finished with a slash.
C# code :
var URL = Regex.Match(link, @"(?:.*?//)?(.*?)(/.*)", RegexOptions.IgnoreCase);
Input :
//cdn.sstatic.net/stackoverflow/img/favicon.ico
/opensearch.xml
http://stackoverflow.com/
http://careers.stackoverflow.com
Output :
//cdn.sstatic.net/stackoverflow/img/favicon.ico has 2 groups:
cdn.sstatic.net
/stackoverflow/img/favicon.ico
/opensearch.xml has 2 groups:
/opensearch.xml
http://stackoverflow.com/ has 2 groups:
stackoverflow.com
/
http://careers.stackoverflow.com has 2 groups:
http:
//careers.stackoverflow.com
Every URL in the output is valid exept for : http://careers.stackoverflow.com. How can I check for a variable part like "if there is a slash, stop to the first one orelse grab everythings".
Add |$ to your last group, to match that text or match the end of the expression.
This works for your inputs:
var links = new[]
{
"//cdn.sstatic.net/stackoverflow/img/favicon.ico",
"/opensearch.xml",
"http://stackoverflow.com/",
"http://careers.stackoverflow.com"
};
foreach (string link in links)
{
var u = Regex.Match(link, @"(?:.*?//)?(.*?)(/.*|$)", RegexOptions.IgnoreCase);
Console.WriteLine(link);
Console.WriteLine(" " + u.Groups[1]);
Console.WriteLine(" " + u.Groups[2]);
Console.WriteLine();
}
Output:
//cdn.sstatic.net/stackoverflow/img/favicon.ico
cdn.sstatic.net
/stackoverflow/img/favicon.ico
/opensearch.xml
/opensearch.xml
http://stackoverflow.com/
stackoverflow.com
/
http://careers.stackoverflow.com
careers.stackoverflow.com
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With