Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I match multiple occurrences with a regex in JavaScript similar to PHP's preg_match_all()?

People also ask

How do I match a pattern in regex?

To match a character having special meaning in regex, you need to use a escape sequence prefix with a backslash ( \ ). E.g., \. matches "." ; regex \+ matches "+" ; and regex \( matches "(" . You also need to use regex \\ to match "\" (back-slash).

What does regex (? S match?

i) makes the regex case insensitive. (? s) for "single line mode" makes the dot match all characters, including line breaks.

Which modifier is used for matching a regex in the complete string in JavaScript?

The "m" modifier specifies a multiline match. It only affects the behavior of start ^ and end $. ^ specifies a match at the start of a string.

What does Preg_match_all return?

The preg_match_all() function returns the number of matches of a pattern that were found in a string and populates a variable with the matches that were found.


Hoisted from the comments

2020 comment: rather than using regex, we now have URLSearchParams, which does all of this for us, so no custom code, let alone regex, are necessary anymore.

– Mike 'Pomax' Kamermans

Browser support is listed here https://caniuse.com/#feat=urlsearchparams


I would suggest an alternative regex, using sub-groups to capture name and value of the parameters individually and re.exec():

function getUrlParams(url) {
  var re = /(?:\?|&(?:amp;)?)([^=&#]+)(?:=?([^&#]*))/g,
      match, params = {},
      decode = function (s) {return decodeURIComponent(s.replace(/\+/g, " "));};

  if (typeof url == "undefined") url = document.location.href;

  while (match = re.exec(url)) {
    params[decode(match[1])] = decode(match[2]);
  }
  return params;
}

var result = getUrlParams("http://maps.google.de/maps?f=q&source=s_q&hl=de&geocode=&q=Frankfurt+am+Main&sll=50.106047,8.679886&sspn=0.370369,0.833588&ie=UTF8&ll=50.116616,8.680573&spn=0.35972,0.833588&z=11&iwloc=addr");

result is an object:

{
  f: "q"
  geocode: ""
  hl: "de"
  ie: "UTF8"
  iwloc: "addr"
  ll: "50.116616,8.680573"
  q: "Frankfurt am Main"
  sll: "50.106047,8.679886"
  source: "s_q"
  spn: "0.35972,0.833588"
  sspn: "0.370369,0.833588"
  z: "11"
}

The regex breaks down as follows:

(?:            # non-capturing group
  \?|&         #   "?" or "&"
  (?:amp;)?    #   (allow "&", for wrongly HTML-encoded URLs)
)              # end non-capturing group
(              # group 1
  [^=]+      #   any character except "=", "&" or "#"; at least once
)              # end group 1 - this will be the parameter's name
(?:            # non-capturing group
  =?           #   an "=", optional
  (            #   group 2
    [^]*     #     any character except "&" or "#"; any number of times
  )            #   end group 2 - this will be the parameter's value
)              # end non-capturing group

You need to use the 'g' switch for a global search

var result = mystring.match(/(&|&)?([^=]+)=([^&]+)/g)

2020 edit

Use URLSearchParams, as this job no longer requires any kind of custom code. Browsers can do this for you with a single constructor:

const str = "1111342=Adam%20Franco&348572=Bob%20Jones";
const data = new URLSearchParams(str);
for (pair of data) console.log(pair)

yields

Array [ "1111342", "Adam Franco" ]
Array [ "348572", "Bob Jones" ]

So there is no reason to use regex for this anymore.

Original answer

If you don't want to rely on the "blind matching" that comes with running exec style matching, JavaScript does come with match-all functionality built in, but it's part of the replace function call, when using a "what to do with the capture groups" handling function:

var data = {};

var getKeyValue = function(fullPattern, group1, group2, group3) {
  data[group2] = group3;
};

mystring.replace(/(?:&|&)?([^=]+)=([^&]+)/g, getKeyValue);

done.

Instead of using the capture group handling function to actually return replacement strings (for replace handling, the first arg is the full pattern match, and subsequent args are individual capture groups) we simply take the groups 2 and 3 captures, and cache that pair.

So, rather than writing complicated parsing functions, remember that the "matchAll" function in JavaScript is simply "replace" with a replacement handler function, and much pattern matching efficiency can be had.


For capturing groups, I'm used to using preg_match_all in PHP and I've tried to replicate it's functionality here:

<script>

// Return all pattern matches with captured groups
RegExp.prototype.execAll = function(string) {
    var match = null;
    var matches = new Array();
    while (match = this.exec(string)) {
        var matchArray = [];
        for (i in match) {
            if (parseInt(i) == i) {
                matchArray.push(match[i]);
            }
        }
        matches.push(matchArray);
    }
    return matches;
}

// Example
var someTxt = 'abc123 def456 ghi890';
var results = /[a-z]+(\d+)/g.execAll(someTxt);

// Output
[["abc123", "123"],
 ["def456", "456"],
 ["ghi890", "890"]]

</script>

Set the g modifier for a global match:

/…/g