URLs cannot contain spaces. URL encoding normally replaces a space with a plus (+) sign or with %20.
An URL can use spaces. Nothing defines that a space is replaced with a + sign. As you noted, an URL can NOT use spaces.
A space is assigned number 32, which is 20 in hexadecimal. When you see “%20,” it represents a space in an encoded URL, for example, http://www.example.com/products%20and%20services.html.
So it's basically A – Z , a – z , 0 – 9 , - , . , _ , ~ , ! , $ , & , ' , ( , ) , * , + , , , ; , = , : , @ , as well as % that must be followed by two hexadecimal digits. Any other character/byte needs to be encoded using the percent-encoding.
You can find a nice list of corresponding URL encoded characters on W3Schools.
+
becomes %2B
%20
+
characters in the path component is expected to be treated literally.To be explicit: +
is only a special character in the query component.
https://www.rfc-editor.org/rfc/rfc3986
Space characters may only be encoded as "+" in one context: application/x-www-form-urlencoded
key-value pairs.
The RFC-1866 (HTML 2.0 specification), paragraph 8.2.1, subparagraph 1 says: "The form field names and values are escaped: space characters are replaced by "+", and then reserved characters are escaped").
Here is an example of such a string in URL where RFC-1866 allows encoding spaces as pluses: "http://example.com/over/there?name=foo+bar". So, only after "?", can spaces be replaced by pluses (in other cases, spaces should be encoded to "%20"). This way of encoding form data is also given in later HTML specifications, for example, look for relevant paragraphs about application/x-www-form-urlencoded
in HTML 4.01 Specification, and so on.
But, because it's hard to always correctly determine the context, it's the best practice to never encode spaces as "+". It's better to percent-encode all characters except "unreserved" defined in RFC-3986, p.2.3. Here is a code example that illustrates what should be encoded. It is given in Delphi (pascal) programming language, but it is very easy to understand how it works for any programmer regardless of the language possessed:
(* percent-encode all unreserved characters as defined in RFC-3986, p.2.3 *)
function UrlEncodeRfcA(const S: AnsiString): AnsiString;
const
HexCharArrA: array [0..15] of AnsiChar = '0123456789ABCDEF';
var
I: Integer;
c: AnsiChar;
begin
// percent-encoding, see RFC-3986, p. 2.1
Result := S;
for I := Length(S) downto 1 do
begin
c := S[I];
case c of
'A' .. 'Z', 'a' .. 'z', // alpha
'0' .. '9', // digit
'-', '.', '_', '~':; // rest of unreserved characters as defined in the RFC-3986, p.2.3
else
begin
Result[I] := '%';
Insert('00', Result, I + 1);
Result[I + 1] := HexCharArrA[(Byte(C) shr 4) and $F)];
Result[I + 2] := HexCharArrA[Byte(C) and $F];
end;
end;
end;
end;
function UrlEncodeRfcW(const S: UnicodeString): AnsiString;
begin
Result := UrlEncodeRfcA(Utf8Encode(S));
end;
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With