Characters allowed in a URL

Tags:

url

People also ask

What special characters are not allowed in URL?

These characters are { , } , | , \ , ^ , ~ , [ , ] , and ` . All unsafe characters must always be encoded within a URL.

Is * allowed in URL?

*'()," [not including the quotes - ed], and reserved characters used for their reserved purposes may be used unencoded within a URL."

EDIT: As @Jukka K. Korpela correctly points out, RFC 1738 was updated by RFC 3986. This has expanded and clarified the characters valid for host, unfortunately it's not easily copied and pasted, but I'll do my best.

In first matched order:

host        = IP-literal / IPv4address / reg-name

IP-literal  = "[" ( IPv6address / IPvFuture  ) "]"

IPvFuture   = "v" 1*HEXDIG "." 1*( unreserved / sub-delims / ":" )

IPv6address =         6( h16 ":" ) ls32
                  /                       "::" 5( h16 ":" ) ls32
                  / [               h16 ] "::" 4( h16 ":" ) ls32
                  / [ *1( h16 ":" ) h16 ] "::" 3( h16 ":" ) ls32
                  / [ *2( h16 ":" ) h16 ] "::" 2( h16 ":" ) ls32
                  / [ *3( h16 ":" ) h16 ] "::"    h16 ":"   ls32
                  / [ *4( h16 ":" ) h16 ] "::"              ls32
                  / [ *5( h16 ":" ) h16 ] "::"              h16
                  / [ *6( h16 ":" ) h16 ] "::"

ls32        = ( h16 ":" h16 ) / IPv4address
                  ; least-significant 32 bits of address

h16         = 1*4HEXDIG 
               ; 16 bits of address represented in hexadecimal

IPv4address = dec-octet "." dec-octet "." dec-octet "." dec-octet

dec-octet   = DIGIT                 ; 0-9
              / %x31-39 DIGIT         ; 10-99
              / "1" 2DIGIT            ; 100-199
              / "2" %x30-34 DIGIT     ; 200-249
              / "25" %x30-35          ; 250-255

reg-name    = *( unreserved / pct-encoded / sub-delims )

unreserved  = ALPHA / DIGIT / "-" / "." / "_" / "~"     <---This seems like a practical shortcut, most closely resembling original answer

reserved    = gen-delims / sub-delims

gen-delims  = ":" / "/" / "?" / "#" / "[" / "]" / "@"

sub-delims  = "!" / "$" / "&" / "'" / "(" / ")"
              / "*" / "+" / "," / ";" / "="

pct-encoded = "%" HEXDIG HEXDIG

Original answer from RFC 1738 specification:

Thus, only alphanumerics, the special characters "$-_.+!*'(),", and reserved characters used for their reserved purposes may be used unencoded within a URL.

^ obsolete since 1998.

The characters allowed in a URI are either reserved or unreserved (or a percent character as part of a percent-encoding)

http://en.wikipedia.org/wiki/Percent-encoding#Types_of_URI_characters

says these are RFC 3986 unreserved characters (sec. 2.3) as well as reserved characters (sec 2.2) if they need to retain their special meaning. And also a percent character as part of a percent-encoding.

The full list of the 66 unreserved characters is in RFC3986, here: https://www.rfc-editor.org/rfc/rfc3986#section-2.3

This is any character in the following regex set:

[A-Za-z0-9_.\-~]

I tested it by requesting my website (apache) with all available chars on my german keyboard as URL parameter:

http://example.com/?^1234567890ß´qwertzuiopü+asdfghjklöä#<yxcvbnm,.-°!"§$%&/()=? `QWERTZUIOPÜ*ASDFGHJKLÖÄ\'>YXCVBNM;:_²³{[]}\|µ@€~

These were not encoded:

^0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ,.-!/()=?`*;:_{}[]\|~

Not encoded after urlencode():

0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ.-_

Not encoded after rawurlencode():

0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ.-_~

Note: Before PHP 5.3.0 rawurlencode() encoded ~ because of RFC 1738. But this was replaced by RFC 3986 so its safe to use, now. But I do not understand why for example {} are encoded through rawurlencode() because they are not mentioned in RFC 3986.

An additional test I made was regarding auto-linking in mail texts. I tested Mozilla Thunderbird, aol.com, outlook.com, gmail.com, gmx.de and yahoo.de and they fully linked URLs containing these chars:

0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ.-_~+#,%&=*;:@

Of course the ? was linked, too, but only if it was used once.

Some people would now suggest to use only the rawurlencode() chars, but did you ever hear that someone had problems to open these websites?

Asterisk
http://wayback.archive.org/web/*/http://google.com

Colon
https://en.wikipedia.org/wiki/Wikipedia:About

Plus
https://plus.google.com/+google

At sign, Colon, Comma and Exclamation mark
https://www.google.com/maps/place/USA/@36.2218457,...

Because of that these chars should be usable unencoded without problems. Of course you should not use &; because of encoding sequences like &. The same reason is valid for % as it used to encode chars in general. And = as it assigns a value to a parameter name.

Finally I would say its ok to use these unencoded:

0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ.-_~!+,*:@

But if you expect randomly generated URLs you should not use punctuation marks like .!, because some mail apps will not auto-link them:

http://example.com/?foo=bar! < last char not linked

Related questions
                            
                                How to extract base URL from a string in JavaScript?
                            
                                Redirect to external URI from ASP.NET MVC controller
                            
                                Pass a local file in to URL in Java
                            
                                URL matrix parameters vs. query parameters
                            
                                What are the safe characters for making URLs?
                            
                                How do I get the different parts of a Flask request's url?
                            
                                How to use a servlet filter in Java to change an incoming servlet request url?
                            
                                Detect URLs in text with JavaScript
                            
                                Retrieving parameters from a URL
                            
                                How to get parameters from the URL with JSP
                            
                                How to get the anchor from the URL using jQuery?
                            
                                URL to load resources from the classpath in Java
                            
                                java.net.URLEncoder.encode(String) is deprecated, what should I use instead?
                            
                                GET URL parameter in PHP
                            
                                Clicking URLs opens default browser
                            
                                URLs: Dash vs. Underscore [closed]
                            
                                URLEncoder not able to translate space character
                            
                                How to write URLs in Latex? [closed]
                            
                                How can I check if a URL exists via PHP?
                            
                                PHP Get Site URL Protocol - http vs https

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Characters allowed in a URL

Tags:

url

People also ask

Related questions

Recent Activity

Donate For Us