I know that %20 and + both decode to the same binary value (a space), and for most webservers, especially those that map to physical files they will point to the same resource.
But my question is, must a url like http://www.example.org/hello%20world point to the same resource as http://www.example.org/hello+world, are they canonically the same?
In HTTP/1.0 + didn't map to a space, so I'm specifically asking about HTTP/1.1.
What is duplicate content? Duplicate content is content that appears on the Internet in more than one place. That “one place” is defined as a location with a unique website address (URL) - so, if the same content appears at more than one web address, you've got duplicate content.
To find details of specific URLs with technical duplicates, click on the blue URL Details button from the URL List. The URL Details tab will slide across, and you then need to navigate to Duplicate Content -> URLs, and you'll see all the duplicate URLs underneath.
Only within the query string: the plus sign is a reserved character, so must be encoded to pass an actual '+' in either the path or the query string. It's use as a substitute for spaces is a W3C Recommendation which only applies to the query string:
Within the query string, the plus sign is reserved as shorthand notation for a space. Therefore, real plus signs must be encoded. This method was used to make query URIs easier to pass in systems which did not allow spaces.
URI Comparison (RFC 2616):
When comparing two URIs to decide if they match or not, a client SHOULD use a case-sensitive octet-by-octet comparison of the entire URIs, with these exceptions:
- A port that is empty or not given is equivalent to the default port for that URI-reference; - Comparisons of host names MUST be case-insensitive; - Comparisons of scheme names MUST be case-insensitive; - An empty abs_path is equivalent to an abs_path of "/".
Characters other than those in the "reserved" and "unsafe" sets (see RFC 2396 [42]) are equivalent to their ""%" HEX HEX" encoding.
Reserved characters (RFC 2396)
";" | "/" | "?" | ":" | "@" | "&" | "=" | "+" | "$" | ","
So, on the third go-around: there is nothing official that declares them to be the same thing. Using '+' literally to direct http://example.org/hello+world
to a directory called hello+world
is incorrect, but there's nothing that says it should instead be considered equivalent to a space.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With