Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How does Chrome address bar determine it's an URL or a search string?

- (BOOL) validateUrl: (NSString *) candidate {
    NSString *urlRegEx =
    @"(http|https)://((\\w)*|([0-9]*)|([-|_])*)+([\\.|/]((\\w)*|([0-9]*)|([-|_])*))+";
    NSPredicate *urlTest = [NSPredicate predicateWithFormat:@"SELF MATCHES %@", urlRegEx];
    if( [urlTest evaluateWithObject:candidate]
            ||[candidate containsString:@".com"]
            ||[candidate containsString:@".net"]
            ||[candidate containsString:@".org"]
            ||[candidate containsString:@".cn"]
            ||[candidate containsString:@".jp"]
           )
    {
       return TRUE;
    }

    return  FALSE;
}

This is a long list of URL domain name, ".com", ".net", ".org", and so on. People don't need to input "http" in the front or address bar.

So how does Chrome address bar determine it's an URL or a search string?

If I input "a.fa", it's not an URL.
"a a.com",it's a search string.
"a.mobi/aaa", it's an URL.
like image 579
Gank Avatar asked Oct 31 '22 20:10

Gank


1 Answers

It would be possible to find the answer through Chromium, as funroll mentioned—but here's the basic idea of what's going on, at least according to my testing.

A string entered into the 'omni box' is determined to be a URL if it follows the format of:

[protocol][subdomains].[subdomains].[domain name].[tld]

Where subdomains (which are optional, of course) and the domain name both contain only letters (for Chrome, this seems to include accented letters), numbers, spaces, and hyphens, and the TLD/Top Level Domain is from an approved list—.com, .net, etc—unless a protocol is specified, in which case any TLD is treated as valid. Protocols also come from a set list, but can be in pretty much any format with a colon following any number of slashes. If the protocol is not part of the set list, the entire URL is treated as a search instead.

If there is a slash after a string in the above URL format (e.g., stackoverflow.com/), then anything afterwards works.

Alternatively, if a slash occurs at the start of the string, Chrome treats it as a URL as well (with the file:// protocol).


Examples of valid URLs (according to Chrome):

  • stackoverflow.com
  • abc.stackoverflow.com
  • abc.abc.abc.abc.stackoverflow.com
  • stáckoverflow.com (this changes the URL, but is allowed—try it!)
  • stack-overflow.com
  • -stackoverflow.com (might not even be a legal domain name, but it works)
  • 4stackoverflow.com
  • stackoverflow.com
  • stackoverflow.com/not valid characters !@#$^æ
  • [http]://stackoverflow.com (the brackets aren't legal, but I can't include the link otherwise)
  • [http]:////stackoverflow.com
  • [http]:stackoverflow.com
  • [http]:stackoverflow.mynewtld

Examples of invalid URLs:

  • stack overflow.com
  • stackoverflow*.com
  • stack/overflow.com
  • stackoverflow.mynewtld

And, well, just about everything else.


Let's just hope there's a library out there somewhere to do all this instead.

like image 98
username tbd Avatar answered Nov 27 '22 21:11

username tbd