I was reviewing some information about the components of the URL, but can't find a reasonable explanation of the the possible full length url and what each component could be. I want to know what a full URL could look like, taking advantage of all of the intricasies. i will also hope to build a little GUI helping explain them once I undderstand them better, but until then I would try with the components I am aware of:
[ ]
Brackets contain a full component
|
Pipe shows possible subcomponents of a component
( )
Parenthesis contain notes, thoughts, and assumptions about the sub/components
My full understanding:
[type][://][subdomain][domain][port][path][file][query][hash]
Here are the descriptions of each component: if it has an *
, it is optional
[type]
* = [ (type {http | https | ftp | file | etc...}) ]
(although this is optional, I believe that it is also required, meaning that modern browsers insert the type to request it to the server, and the server may return a different type as well)
[://]
= (don't know what this is called)
[subdomain]
* = [ [subdomain] | [subdomain]subdomain ]
[domain]
= [ name . (type {com | org | etc..}) ]
[port]
* = [ (blank which is by default port:80) | port:** ]
[path]
* = [ (blank) | [path] | [path]path ]
[file]
= [ name . (type {html | php | php | (etc...) }) ]
[query]
* = [ ?[ blank(ie no query) | paramater=value | paramater=value¶mater=value(etc...) ]]
[hash]
* = [ #[ blank(ie no hash) | anyStringToBeParsedClientSide(usually for persistence) ]
( just learned a hash is also known as a fragment identifier )
What else am I forgetting, or am I overlooking a good site that explains them. Please correct my naming, as they are likely incorrect, as I am trying to also learn what they are called.
The URL Syntax Scheme name — The scheme identifies the protocol to be used to access the resource on the Internet. The scheme names followed by the three characters :// (a colon and two slashes). The most commonly used protocols are http:// , https:// , ftp:// , and mailto:// .
According to https://www.rfc-editor.org/rfc/rfc1738 section the hpath element of an URL can not contain a '. ' (period).
If you really want all the intricacies, standards documents are the only way to go, and learning to find and read them definitely pays off. And RFC's aren't typically very hard to read.
In this case, RFC 1738 (Uniform Resource Locators) is the resource you want. It's no more "overly technical" than what you've come up with so far; in fact, section 5 has the formal BNF grammar similar to what you wrote.
You might also be interested in RFC 3986 (Uniform Resource Identifiers) which describes the URI format, which is more general than mere URLs.
Some of the things you mention are specific to HTTP, described in RFC 2616 (Hypertext Transfer Protocol 1.1). Section 3.2 briefly touches on URIs.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With