Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

URL parts canonical terminology

I've been reading around and it seems there is no very well coherent and fully accepted terminology for the URL parts. Is that true? I'd like to know which standards exists for URL parts terminology. What is the most common? Is there any well established standard?

I found the following:

  1. RFC3986 section 3

     foo://example.com:8042/over/there?name=ferret#nose
     \_/   \______________/\_________/ \_________/ \__/
      |           |            |            |        |
   scheme     authority       path        query   fragment
      |   _____________________|__
     / \ /                        \
     urn:example:animal:ferret:nose
  1. window.location from Javascript on browsers

protocol://username:password@hostname:port/pathname?search#hash
-----------------------------href------------------------------
                             -----host----
-----------      origin      -------------
  • protocol - protocol scheme of the URL, including the final ':'
  • hostname - domain name
  • port - port number
  • pathname - /pathname
  • search - ?parameters
  • hash - #fragment_identifier
  • username - username specified before the domain name
  • password - password specified before the domain name
  • href - the entire URL
  • origin - protocol://hostname:port
  • host - hostname:port
  1. NodeJS, module url

Above the line with the URL you see node's url module old API, whilst under the line you see the new API. It seems node shifted from a RFC standard terminology to a more browser friendly standard terminology, that is, similar to browser's windows.location.

┌────────────────────────────────────────────────────────────────────────────────────────────────┐
│                                              href                                              │
├──────────┬──┬─────────────────────┬────────────────────────┬───────────────────────────┬───────┤
│ protocol │  │        auth         │          host          │           path            │ hash  │
│          │  │                     ├─────────────────┬──────┼──────────┬────────────────┤       │
│          │  │                     │    hostname     │ port │ pathname │     search     │       │
│          │  │                     │                 │      │          ├─┬──────────────┤       │
│          │  │                     │                 │      │          │ │    query     │       │
"  https:   //    user   :   pass   @ sub.example.com : 8080   /p/a/t/h  ?  query=string   #hash "
│          │  │          │          │    hostname     │ port │          │                │       │
│          │  │          │          ├─────────────────┴──────┤          │                │       │
│ protocol │  │ username │ password │          host          │          │                │       │
├──────────┴──┼──────────┴──────────┼────────────────────────┤          │                │       │
│   origin    │                     │         origin         │ pathname │     search     │ hash  │
├─────────────┴─────────────────────┴────────────────────────┴──────────┴────────────────┴───────┤
│                                              href                                              │
└────────────────────────────────────────────────────────────────────────────────────────────────┘
  1. Highly ranked article from Matt Cutts

URL: http://video.google.co.uk:80/videoplay?docid=-7246927612831078230&hl=en#00h02m30s

  • The protocol is http. Other protocols include https, ftp, etc.
  • The host or hostname is video.google.co.uk.
  • The subdomain is video.
  • The domain name is google.co.uk.
  • The top-level domain or TLD is uk. The uk domain is also referred to as a country-code top-level domain or ccTLD. For google.com, the TLD would be com.
  • The second-level domain (SLD) is co.uk.
  • The port is 80, which is the default port for web servers. Other ports are possible; a web server can listen on port 8000, for example. When the port is 80, most people leave out the port.
  • The path is /videoplay. Path typically refers to a file or location on the web server, e.g. /directory/file.html
  • This URL has parameters. The name of one parameter is docid and the value of that parameter is 7246927612831078230. URLs can have lots parameters. Parameters start with a question mark (?) and are separated with an ampersand (&).

Some of my concerns:

  1. Is window.location a standard or based on a standard?

  2. Shall I call http:// the protocol or the scheme?

  3. Shall I say host or authority?

  4. Why nor window.location nor node have properties for TLD or other domain parts, when available?

  5. The terminological difference between hostname (example.com) and host (example.com:8080) is well established?

  6. for node origin does not include username:password@ whilst for windows.location it does

I'd like to follow on my code a well established standard or best practises.

like image 645
João Pimentel Ferreira Avatar asked Feb 16 '19 11:02

João Pimentel Ferreira


People also ask

What is canonical form of URL?

A canonical URL is the URL of the best representative page from a group of duplicate pages, according to Google. For example, if you have two URLs for the same page (such as example.com? dress=1234 and example.com/dresses/1234 ), Google chooses one as canonical.

How do I find my canonical URL?

A Canonical URL is an HTML tag in the <head> section of a web page. The best way to show the search engine which page URL has the original content.

How do I use canonical URL?

Use a rel="canonical" link tag To indicate when a page is a duplicate of another page, you can use a <link> tag in the head section of your HTML. Suppose you want https://example.com/dresses/green-dresses to be the canonical URL, even though a variety of URLs can access this content.

Why is canonical URL important?

Canonical URLs Consolidate Links For Duplicate Content and Manage Syndicated Content. Canonical URLs help search engines combine information about a URL into one authoritative URL. Besides, they also help to consolidate page ranking to your preferred URL.


2 Answers

The URI standard is STD 66. This is currently mapped to RFC 3986.

So for the generic URI syntax, these terms are authoritative, currently:

  • scheme
  • authority
  • userinfo
  • host
  • port
  • path
  • query
  • fragment
like image 155
unor Avatar answered Oct 18 '22 23:10

unor


Terminology depends on which architectural style/technology you are using.

I prefer REST style for identifying different parts of my url REST URI Standard

But I repeat again there are no single universal standard to represent URL

like image 45
sandesh dahake Avatar answered Oct 18 '22 23:10

sandesh dahake