I'm writing some code that processes URLs, and I want to make sure i'm not leaving some strange case out...
Are there any valid characters for a host other than: A-Z, 0-9, "-" and "."?
(This includes anything that can be in subdomains, etc. Esentially, anything between :// and the first /)
Thanks!
A URL is composed from a limited set of characters belonging to the US-ASCII character set. These characters include digits (0-9), letters(A-Z, a-z), and a few special characters ( "-" , "." , "_" , "~" ).
DNS domain names. DNS names can contain only alphabetical characters (A-Z), numeric characters (0-9), the minus sign (-), and the period (.). Period characters are allowed only when they are used to delimit the components of domain style names.
These characters are "{", "}", "|", "\", "^", "~", "[", "]", and "`". All unsafe characters must always be encoded within a URL.
While a hostname may not contain other characters, such as the underscore character (_), other DNS names may contain the underscore. This restriction was lifted by RFC 2181.
Please see Restrictions on valid host names:
Hostnames are composed of series of labels concatenated with dots, as are all domain names1. For example, "en.wikipedia.org" is a hostname. Each label must be between 1 and 63 characters long, and the entire hostname has a maximum of 255 characters.
RFCs mandate that a hostname's labels may contain only the ASCII letters 'a' through 'z' (case-insensitive), the digits '0' through '9', and the hyphen. Hostname labels cannot begin or end with a hyphen. No other symbols, punctuation characters, or blank spaces are permitted.
no, that is all that is allowed
here is a reference if you like to read: http://www.ietf.org/rfc/rfc1034.txt
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With