Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Javascript Regular Expressions Functionality

Tags:

javascript

I've spent a few hours on this and I can't seem to figure this one out.

In the code below, I'm trying to understand exactly what and how the regular expressions in the url.match are working.

As the code is below, it doesn't work. However if I remove (?:&toggle=|&ie=utf-8|&FORM=|&aq=|&x=|&gwp) it seems to give me the output that I want.

However, I don't want to remove this without understanding what it is doing.

I found a pretty useful resource, but after a few hours I still can't precisely determine what these expressions are doing:

https://developer.mozilla.org/en-US/docs/JavaScript/Guide/Regular_Expressions#Using_Parenthesized_Substring_Matches

Could someone break this down for me and explain how exactly it is parsing the strings. The expressions themselves and the placement of the parentheses is not really clear to me and frankly very confusing.

Any help is appreciated.

(function($) {    

  $(document).ready(function() {         

      function parse_keywords(url){
          var matches = url.match(/.*(?:\?p=|\?q=|&q=|\?s=)([a-zA-Z0-9 +]*)(?:&toggle=|&ie=utf-8|&FORM=|&aq=|&x=|&gwp)/);
          return matches ? matches[1].split('+') : [];

      }
      myRefUrl = "http://www.google.com/url?sa=f&rct=j&url=https://www.mydomain.com/&q=my+keyword+from+google&ei=fUpnUaage8niAKeiICgCA&usg=AFQjCNFAlKg_w5pZzrhwopwgD12c_8z_23Q";

      myk1 = (parse_keywords(myRefUrl));

      kw="";

      for (i=0;i<myk1.length;i++) {
          if (i == (myk1.length - 1)) {
          kw = kw + myk1[i];
          }
          else {
          kw = kw + myk1[i] + '%20';
          }
      }

      console.log (kw);

      if (kw != null && kw != "" && kw != " " && kw != "%20") {

      orighref = $('a#applynlink').attr('href');
      $('a#applynlink').attr('href', orighref + '&scbi=' + kw);
      }                     

  });  

})(jQuery);
like image 946
Russell Avatar asked Jan 14 '23 16:01

Russell


1 Answers

Let's break this regex down.

/

Begin regex.

.*

Match zero or more anything - basically, we're willing to match this regex at any point into the string.

(?:\?p=
|\?q=
|&q=
|\?s=)

In this, the ?: means 'do not capture anything inside of this group'. See http://www.regular-expressions.info/refadv.html

The \? means take ? literally, which is normally a character meaning 'match 0 or 1 copies of the previous token' but we want to match an actual ?.

Other than that, it's just looking for a multitude of different options to select (| means 'the regex is valid if I match either what's before me or after me.)

([a-zA-Z0-9 +]*)

Now we match zero or more of any of the following characters in any arrangement: a-ZA-Z0-9 + And since it is inside a () with no ?: we DO capture it.

(?:&toggle=
|&ie=utf-8
|&FORM=
|&aq=
|&x=
|&gwp)

We see another ?: so this is another non-capturing group. Other than that, it is just full of literal characters separated by |s, so it is not doing any fancy logic.

/

End regex.

In summary, this regex looks through the string for any instance of the first non capturing group, captures everything inside of it, then looks for any instance of the second non capturing group to 'cap' it off and returns everything that was between those two non capturing groups. (Think of it as a 'sandwich', we look for the header and footer and capture everything in between that we're interested in)

After the regex runs, we do this:

return matches ? matches[1].split('+') : [];

Which grabs the captured group and splits it on + into an array of strings.

like image 180
Patashu Avatar answered Jan 22 '23 00:01

Patashu