Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Are %20 and + the same in a http url? [duplicate]

Tags:

http

urlencode

I know that %20 and + both decode to the same binary value (a space), and for most webservers, especially those that map to physical files they will point to the same resource.

But my question is, must a url like http://www.example.org/hello%20world point to the same resource as http://www.example.org/hello+world, are they canonically the same?

In HTTP/1.0 + didn't map to a space, so I'm specifically asking about HTTP/1.1.

like image 977
Evert Avatar asked Oct 21 '10 20:10

Evert


People also ask

What is a duplicate URL?

What is duplicate content? Duplicate content is content that appears on the Internet in more than one place. That “one place” is defined as a location with a unique website address (URL) - so, if the same content appears at more than one web address, you've got duplicate content.

How do I find the duplicate URL of a website?

To find details of specific URLs with technical duplicates, click on the blue URL Details button from the URL List. The URL Details tab will slide across, and you then need to navigate to Duplicate Content -> URLs, and you'll see all the duplicate URLs underneath.


1 Answers

Only within the query string: the plus sign is a reserved character, so must be encoded to pass an actual '+' in either the path or the query string. It's use as a substitute for spaces is a W3C Recommendation which only applies to the query string:

Within the query string, the plus sign is reserved as shorthand notation for a space. Therefore, real plus signs must be encoded. This method was used to make query URIs easier to pass in systems which did not allow spaces.

URI Comparison (RFC 2616):

When comparing two URIs to decide if they match or not, a client SHOULD use a case-sensitive octet-by-octet comparison of the entire URIs, with these exceptions:

  - A port that is empty or not given is equivalent to the default
    port for that URI-reference;

    - Comparisons of host names MUST be case-insensitive;

    - Comparisons of scheme names MUST be case-insensitive;

    - An empty abs_path is equivalent to an abs_path of "/".

Characters other than those in the "reserved" and "unsafe" sets (see RFC 2396 [42]) are equivalent to their ""%" HEX HEX" encoding.

Reserved characters (RFC 2396)

";" | "/" | "?" | ":" | "@" | "&" | "=" | "+" | "$" | ","

So, on the third go-around: there is nothing official that declares them to be the same thing. Using '+' literally to direct http://example.org/hello+world to a directory called hello+world is incorrect, but there's nothing that says it should instead be considered equivalent to a space.

like image 72
Brad Mace Avatar answered Oct 07 '22 12:10

Brad Mace