Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Match pattern for all Google search pages

I'm developing an extension which will perform a certain action on all Google search URLs - but not on other websites or Google pages. In natural language the match pattern is:

  • Any protocol ('*://')
  • Any subdomain or none ('www' or '')
  • The domain string must equal 'google'
  • Any TLD including three-letter TLDs (e.g. '.com') and multi-part country TLDs (e.g. '.co.uk')
  • The first 8 letters of the path must equal '/search?'

Many people say 'to match all google search pages use "*://*.google.com/search?*" but this is patently untrue as it will not match national TLDs like google.co.uk.

Thus the following code does not work at all:

chrome.webRequest.onBeforeRequest.addListener(
  function(details) {
    alert('This never happens');
  }, {
    urls: [
        "*://*.google.*/search?*",
        "*://google.*/search?*",
    ],
    types: ["main_frame"]
  },
  ["blocking"]
);

Using "*://*.google.com/search?*" as the match pattern does work, but I fear I would need a list of every single Google localisation for that to be an effective strategy.

like image 693
DMCoding Avatar asked Dec 20 '22 14:12

DMCoding


2 Answers

Unfortunately, match patterns do not allow wildcards for TLDs for security reasons.

You cannot use wildcard match patterns like http://google.*/* to match TLDs (like http://google.es and http://google.fr) due to the complexity of actually restricting such a match to only the desired domains.

For the example of http://google.*/*, the Google domains would be matched, but so would http://google.someotherdomain.com. Additionally, many sites do not own all of the TLDs for their domain. For an example, assume you want to use http://example.*/* to match http://example.com and http://example.es, but http://example.net is a hostile site. If your extension has a bug, the hostile site could potentially attack your extension in order to get access to your extension's increased privileges.

You should explicitly enumerate the TLDs that you wish to run your extension on.

A slightly unrealistic option would be to list all variants with all national TLDs.

Edit: thanks to an incredibly helpful comment by rsanchez, here's an up to date list of all Google domain variants which makes this approach viable.

A realistic option is to inject into a larger set of pages (for instance, all pages), then analyze the URL (with a regexp, for example) and only execute if it matches the pattern you are looking for. Yes, it will be a scarier permissions warning, and you will have to explain it to your users.

like image 82
Xan Avatar answered Dec 24 '22 01:12

Xan


Source: https://stackoverflow.com/a/16187588/6250024

I was wondering the same and found the same question with a better solution, which introduces the "include_globs" parameters.

"matches":        ["http://*/*", "https://*/*"],
"include_globs":  ["http://www.google.*/*", "https://www.google.*/*"],
like image 44
Dididi Avatar answered Dec 24 '22 00:12

Dididi