Does HttpUtility.UrlEncode match the spec for 'x-www-form-urlencoded'?

Tags:

Per MSDN

URLEncode converts characters as follows:

Spaces ( ) are converted to plus signs (+).

Non-alphanumeric characters are escaped to their hexadecimal representation.

Which is similar, but not exactly the same as W3C

application/x-www-form-urlencoded

This is the default content type. Forms submitted with this content type must be encoded as follows:

Control names and values are escaped. Space characters are replaced by '+', and then reserved characters are escaped as described in RFC1738, section 2.2: Non-alphanumeric characters are replaced by '%HH', a percent sign and two hexadecimal digits representing the ASCII code of the character. Line breaks are represented as "CR LF" pairs (i.e., '%0D%0A').

The control names/values are listed in the order they appear in the document. The name is separated from the value by '=' and name/value pairs are separated from each other by '&'.

My question is, has anyone done the work to determine whether URLEncode produces valid x-www-form-urlencoded data?

561

asked Jul 08 '10 22:07

hemp

1 Answers

Well, the documentation you linked to is for IIS 6 Server.UrlEncode, but your title seems to ask about .NET System.Web.HttpUtility.UrlEncode. Using a tool like Reflector, we can see the implementation of the latter and determine if it meets the W3C spec.

Here is the encoding routine that is ultimately called (note, it is defined for an array of bytes, and other overloads that take strings eventually convert those strings to byte arrays and call this method). You would call this for each control name and value (to avoid escaping the reserved characters = & used as separators).

protected internal virtual byte[] UrlEncode(byte[] bytes, int offset, int count)
{
    if (!ValidateUrlEncodingParameters(bytes, offset, count))
    {
        return null;
    }
    int num = 0;
    int num2 = 0;
    for (int i = 0; i < count; i++)
    {
        char ch = (char) bytes[offset + i];
        if (ch == ' ')
        {
            num++;
        }
        else if (!HttpEncoderUtility.IsUrlSafeChar(ch))
        {
            num2++;
        }
    }
    if ((num == 0) && (num2 == 0))
    {
        return bytes;
    }
    byte[] buffer = new byte[count + (num2 * 2)];
    int num4 = 0;
    for (int j = 0; j < count; j++)
    {
        byte num6 = bytes[offset + j];
        char ch2 = (char) num6;
        if (HttpEncoderUtility.IsUrlSafeChar(ch2))
        {
            buffer[num4++] = num6;
        }
        else if (ch2 == ' ')
        {
            buffer[num4++] = 0x2b;
        }
        else
        {
            buffer[num4++] = 0x25;
            buffer[num4++] = (byte) HttpEncoderUtility.IntToHex((num6 >> 4) & 15);
            buffer[num4++] = (byte) HttpEncoderUtility.IntToHex(num6 & 15);
        }
    }
    return buffer;
}

public static bool IsUrlSafeChar(char ch)
{
    if ((((ch >= 'a') && (ch <= 'z')) || ((ch >= 'A') && (ch <= 'Z'))) || ((ch >= '0') && (ch <= '9')))
    {
        return true;
    }
    switch (ch)
    {
        case '(':
        case ')':
        case '*':
        case '-':
        case '.':
        case '_':
        case '!':
            return true;
    }
    return false;
}

The first part of the routine counts the number of characters that need to be replaced (spaces and non- url safe characters). The second part of the routine allocates a new buffer and performs replacements:

Url Safe Characters are kept as is: a-z A-Z 0-9 ()*-._!
Spaces are converted to plus signs
All other characters are converted to %HH

RFC1738 states (emphasis mine):

Thus, only alphanumerics, the special characters "$-_.+!*'(),", and
reserved characters used for their reserved purposes may be used
unencoded within a URL.

On the other hand, characters that are not required to be encoded
(including alphanumerics) may be encoded within the scheme-specific
part of a URL, as long as they are not being used for a reserved
purpose.

The set of Url Safe Characters allowed by UrlEncode is a subset of the special characters defined in RFC1738. Namely, the characters $, are missing and will be encoded by UrlEncode even when the spec says they are safe. Since they may be used unencoded (and not must), it still meets the spec to encode them (and the second paragraph states that explicitly).

With respect to line breaks, if the input has a CR LF sequence then that will be escaped %0D%0A. However, if the input has only LF then that will be escaped %0A (so there is no normalization of line breaks in this routine).

Bottom Line: It meets the specification while additionally encoding $,, and the caller is responsible for providing suitably normalized line breaks in the input.

152

answered Oct 11 '22 16:10

Michael Petito

Related questions
                            
                                How to auto increment the version (eg. “1.0.*”) of a .NET Core project?
                            
                                Daylight saving changes affecting UTC conversion
                            
                                Updating Binding immediately when DataContext changes
                            
                                speech recognition from audio file instead of microphone
                            
                                How to combine designable components with dependency injection
                            
                                Can making a method static improve performance, and under what circumstances?
                            
                                MessageBox.Show()
                            
                                Is it possible to invoke internal method from a dynamic method in .NET?
                            
                                Why is the size of .NET Framework 4.0 installer smaller than 3.0/3.5?
                            
                                What pattern/patterns work best for developing rule/decision engine [closed]
                            
                                WebBrowser Control in a web application
                            
                                Do I really have to call Focus in OnMouseDown of my custom Control?
                            
                                Guaranteed semaphore order?
                            
                                Is it safe to lock a static variable in a non-static class?
                            
                                Is Stopwatch really broken?
                            
                                How to draw candle charts in C# [closed]
                            
                                How to trace WCF serialization issues / exceptions
                            
                                Same IL code, different output - how is it possible?
                            
                                System.UriFormatException: Invalid URI: The hostname could not be parsed
                            
                                Using Where to specify different generics

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Does HttpUtility.UrlEncode match the spec for 'x-www-form-urlencoded'?

Tags:

standards-compliance

.net

urlencode

hemp

People also ask

1 Answers

Michael Petito

Recent Activity

Donate For Us