Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

URL encoding the space character: + or %20?

When is a space in a URL encoded to +, and when is it encoded to %20?

like image 457
BC. Avatar asked Oct 27 '09 23:10

BC.


People also ask

What characters should be URL-encoded?

Special characters needing encoding are: ':' , '/' , '?' , '#' , '[' , ']' , '@' , '!' , '$' , '&' , "'" , '(' , ')' , '*' , '+' , ',' , ';' , '=' , as well as '%' itself.

What is %20 in a URL?

A space is assigned number 32, which is 20 in hexadecimal. When you see “%20,” it represents a space in an encoded URL, for example, http://www.example.com/products%20and%20services.html.

What does %2f mean in a URL?

URL encoding converts characters into a format that can be transmitted over the Internet. - w3Schools. So, "/" is actually a seperator, but "%2f" becomes an ordinary character that simply represents "/" character in element of your url. Follow this answer to receive notifications.

Is space allowed in URL?

Spaces are not allowed in URLs. They should be replaced by the string %20. In the query string part of the URL, %20 can be abbreviated using a plus sign (+).


2 Answers

From Wikipedia (emphasis and link added):

When data that has been entered into HTML forms is submitted, the form field names and values are encoded and sent to the server in an HTTP request message using method GET or POST, or, historically, via email. The encoding used by default is based on a very early version of the general URI percent-encoding rules, with a number of modifications such as newline normalization and replacing spaces with "+" instead of "%20". The MIME type of data encoded this way is application/x-www-form-urlencoded, and it is currently defined (still in a very outdated manner) in the HTML and XForms specifications.

So, the real percent encoding uses %20 while form data in URLs is in a modified form that uses +. So you're most likely to only see + in URLs in the query string after an ?.

like image 90
Joey Avatar answered Sep 20 '22 17:09

Joey


This confusion is because URLs are still 'broken' to this day.

From a blog post:

Take "http://www.google.com" for instance. This is a URL. A URL is a Uniform Resource Locator and is really a pointer to a web page (in most cases). URLs actually have a very well-defined structure since the first specification in 1994.

We can extract detailed information about the "http://www.google.com" URL:

+---------------+-------------------+ |      Part     |      Data         | +---------------+-------------------+ |  Scheme       | http              | |  Host         | www.google.com    | +---------------+-------------------+ 

If we look at a more complex URL such as:

"https://bob:[email protected]:8080/file;p=1?q=2#third"

we can extract the following information:

+-------------------+---------------------+ |        Part       |       Data          | +-------------------+---------------------+ |  Scheme           | https               | |  User             | bob                 | |  Password         | bobby               | |  Host             | www.lunatech.com    | |  Port             | 8080                | |  Path             | /file;p=1           | |  Path parameter   | p=1                 | |  Query            | q=2                 | |  Fragment         | third               | +-------------------+---------------------+  https://bob:[email protected]:8080/file;p=1?q=2#third \___/   \_/ \___/ \______________/ \__/\_______/ \_/ \___/   |      |    |          |          |      | \_/  |    | Scheme User Password    Host       Port  Path |   | Fragment         \_____________________________/       | Query                        |               Path parameter                    Authority 

The reserved characters are different for each part.

For HTTP URLs, a space in a path fragment part has to be encoded to "%20" (not, absolutely not "+"), while the "+" character in the path fragment part can be left unencoded.

Now in the query part, spaces may be encoded to either "+" (for backwards compatibility: do not try to search for it in the URI standard) or "%20" while the "+" character (as a result of this ambiguity) has to be escaped to "%2B".

This means that the "blue+light blue" string has to be encoded differently in the path and query parts:

"http://example.com/blue+light%20blue?blue%2Blight+blue".

From there you can deduce that encoding a fully constructed URL is impossible without a syntactical awareness of the URL structure.

This boils down to:

You should have %20 before the ? and + after.

Source

like image 37
Matas Vaitkevicius Avatar answered Sep 18 '22 17:09

Matas Vaitkevicius