Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I reliably get the actual URL, even when there are percent-encoded parts in the path?

IIS and ASP.NET (MVC) has some glitches when working with urls with %-encoding in the path (not the query-string; the query-string is fine). How can I get around this? i.e. how can I get the actual URL that was requested?

For example, if I navigate to /x%3Fa%3Db and (separately) to /x?a=b - both of them report the .Request.Url as /x?a=b - because the encoded data in the path is reported incorrectly.

like image 346
Marc Gravell Avatar asked Feb 27 '13 16:02

Marc Gravell


People also ask

How do you escape spaces in a URL?

URLs cannot contain spaces. URL encoding normally replaces a space with a plus (+) sign or with %20.

How do I stop URL encoding?

Another interesting oddity is that when you copy URLs out of Firefox or Chrome they are URL encoded, which can be very annoying. To prevent this simply type a character in the URL and erase it, before you copy the URL.

Is URL encoding necessary?

Why do we need to encode? URLs can only have certain characters from the standard 128 character ASCII set. Reserved characters that do not belong to this set must be encoded. This means that we need to encode these characters when passing into a URL.


1 Answers

The way I've tacked this is to look at the underlying server-variables; the URL variable contains a decoded value; the QUERY_STRING variable contains the still-encoded query. We can't just call encode on the URL part, because that also contains the orignal / etc in their original form - if we blindly encode the entire thing we'll get unwanted %2f values; however, can pull it apart and spot problematic cases:

private static readonly Regex simpleUrlPath = new Regex("^[-a-zA-Z0-9_/]*$", RegexOptions.Compiled);
private static readonly char[] segmentsSplitChars = { '/' };
// ^^^ avoids lots of gen-0 arrays being created when calling .Split
public static Uri GetRealUrl(this HttpRequest request)
{
    if (request == null) throw new ArgumentNullException("request");
    var baseUri = request.Url; // use this primarily to avoid needing to process the protocol / authority
    try
    {
        var vars = request.ServerVariables;
        var url = vars["URL"];
        if (string.IsNullOrEmpty(url) || simpleUrlPath.IsMatch(url)) return baseUri; // nothing to do - looks simple enough even for IIS

        var query = vars["QUERY_STRING"];
        // here's the thing: url contains *decoded* values; query contains *encoded* values

        // loop over the segments, encoding each separately
        var sb = new StringBuilder(url.Length * 2); // allow double to be pessimistic; we already expect trouble
        var segments = url.Split(segmentsSplitChars);
        foreach (var segment in segments)
        {
            if (segment.Length == 0)
            {
                if(sb.Length != 0) sb.Append('/');
            }
            else if (simpleUrlPath.IsMatch(segment))
            {
                sb.Append('/').Append(segment);
            }
            else
            {
                sb.Append('/').Append(HttpUtility.UrlEncode(segment));
            }
        }
        if (!string.IsNullOrEmpty(query)) sb.Append('?').Append(query); // query is fine; nothing needing
        return new Uri(baseUri, sb.ToString());
    }
    catch (Exception ex)
    { // if something unexpected happens, default to the broken ASP.NET handling
        GlobalApplication.LogException(ex);
        return baseUri;
    }
}
like image 60
Marc Gravell Avatar answered Sep 21 '22 23:09

Marc Gravell