Just so everybody understand the vocabulary involved, the general structure of a URL is as follows:
http :// www.a.com / path/to/resource.html ? query=value # fragment
{scheme} :// {authority} / {path} ? {query} # {fragment}
The path consists of a path and a resource, in the case of path/to/resource.html
the path is path/to/
and the resource is resource.html
.
Poor, Nasty and Brutish:
HTML, as it is found in the wild, can be poor, nasty and brutish, though quite often far from short. In this poor, nasty and brutish world happen to live links, which in themselves can be poor, nasty and brutish, despite the fact that URLs are supposed to adhere to the standards. So with this in mind, I present you the problem...
Problem:
I'm trying to create a regex to remove the resource from a URL's path, which is necessary when there is a link within a web page that is a relative path. For example:
www.domain.com/path/to/page1.html
./page2.html
/page1.html
from the URL/page2.html
to www.domain.com/path/to
Result: in www.domain.com/path/to/page2.html
I'm stuck on step 3!
I've isolated the path and resource, but now I want to separate the two. The regex I tried to come up with looks like this: \z([^\/]\.[^\/])
In C# the same regex is: "\\z([^/]\\.[^/])"
Translated in English, the regex is supposed to mean: match the end of the string which includes all characters separated by a dot as long as those characters are not slashes.
I tried that regular expression, but currently it fails miserably. What is the proper query to achieve the said result.
Here are some sample cases:
/path/to/resource.html => /path/to/ and resource.html
/pa.th/to/resource.html => /pa.th/to/ and resource.html
/path/to/resource.html/ => /path/to/resource.html/
/*I#$>/78zxdc.78&(!~ => /*I#$>/ and 78zxdc.78&(!~
Thanks for your help!
System.Uri
var uri = new Uri("http://www.domain.com/path/to/page1.html?query=value#fragment");
Console.WriteLine(uri.Scheme); // http
Console.WriteLine(uri.Host); // www.domain.com
Console.WriteLine(uri.AbsolutePath); // /path/to/page1.html
Console.WriteLine(uri.PathAndQuery); // /path/to/page1.html?query=value
Console.WriteLine(uri.Query); // ?query=value
Console.WriteLine(uri.Fragment); // #fragment
Console.WriteLine(uri.Segments[uri.Segments.Length - 1]); // page1.html
for (var i = 0 ; i < uri.Segments.Length ; i++)
{
Console.WriteLine("{0}: {1}", i, uri.Segments[i]);
/*
Output
0: /
1: path/
2: to/
3: page1.html
*/
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With