I am interested in knowing why '%20' is used as a space in URLs, particularly why %20 was used and why we even need it in the first place.
A space is assigned number 32, which is 20 in hexadecimal. When you see “%20,” it represents a space in an encoded URL, for example, http://www.example.com/products%20and%20services.html.
In the relevant RFC 3986, spaces are defined as 'unsafe characters'. It is stipulated that spaces must not be left untreated in a URL and must instead be converted (encoded). Special characters in URLs are usually expressed using the percent sign and a sequence of numbers.
As per RFC 1738: Unsafe: Characters can be unsafe for a number of reasons. The space character is unsafe because significant spaces may disappear and insignificant spaces may be introduced when URLs are transcribed or typeset or subjected to the treatment of word-processing programs.
It's called percent encoding. Some characters can't be in a URI (for example #
, as it denotes the URL fragment), so they are represented with characters that can be (#
becomes %23
)
Here's an excerpt from that same article:
When a character from the reserved set (a "reserved character") has special meaning (a "reserved purpose") in a certain context, and a URI scheme says that it is necessary to use that character for some other purpose, then the character must be percent-encoded. Percent-encoding a reserved character involves converting the character to its corresponding byte value in ASCII and then representing that value as a pair of hexadecimal digits. The digits, preceded by a percent sign ("%") which is used as an escape character, are then used in the URI in place of the reserved character. (For a non-ASCII character, it is typically converted to its byte sequence in UTF-8, and then each byte value is represented as above.)
The space character's character code is 32
:
> ' '.charCodeAt(0) 32
Which is 20
in base-16:
> ' '.charCodeAt(0).toString(16) "20"
Tack a percent sign in front of it and you get %20
.
Because URLs have strict syntactic rules, like /
being a special path separator character, spaces not being allowed in a URL and all characters having to be a certain subset of ASCII. To embed arbitrary characters in URLs regardless of these restrictions, bytes can be percent encoded. The byte x20
represents a space in the ASCII encoding (and most other encodings), hence %20
is the URL-encoded version of it.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With