Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I concatenate regex literals in JavaScript?

People also ask

Can you concatenate regex?

The Concatenation OperatorThis operator concatenates two regular expressions a and b . No character represents this operator; you simply put b after a . The result is a regular expression that will match a string if a matches its first part and b matches the rest.

How do you combine two regular expressions?

to combine two expressions or more, put every expression in brackets, and use: *? This are the signs to combine, in order of relevance: ?

How do I concatenate in JavaScript?

The + Operator The same + operator you use for adding two numbers can be used to concatenate two strings. You can also use += , where a += b is a shorthand for a = a + b . If the left hand side of the + operator is a string, JavaScript will coerce the right hand side to a string.

How do you use regex literals?

If using the RegExp constructor with a string literal, remember that the backslash is an escape in string literals, so to use it in the regular expression, you need to escape it at the string literal level. /a\*b/ and new RegExp("a\\*b") create the same expression, which searches for "a" followed by a literal "*" ...


Here is how to create a regular expression without using the regular expression literal syntax. This lets you do arbitary string manipulation before it becomes a regular expression object:

var segment_part = "some bit of the regexp";
var pattern = new RegExp("some regex segment" + /*comment here */
              segment_part + /* that was defined just now */
              "another segment");

If you have two regular expression literals, you can in fact concatenate them using this technique:

var regex1 = /foo/g;
var regex2 = /bar/y;
var flags = (regex1.flags + regex2.flags).split("").sort().join("").replace(/(.)(?=.*\1)/g, "");
var regex3 = new RegExp(expression_one.source + expression_two.source, flags);
// regex3 is now /foobar/gy

It's just more wordy than just having expression one and two being literal strings instead of literal regular expressions.


Just randomly concatenating regular expressions objects can have some adverse side effects. Use the RegExp.source instead:

var r1 = /abc/g;
var r2 = /def/;
var r3 = new RegExp(r1.source + r2.source, 
                   (r1.global ? 'g' : '') 
                   + (r1.ignoreCase ? 'i' : '') + 
                   (r1.multiline ? 'm' : ''));
console.log(r3);
var m = 'test that abcdef and abcdef has a match?'.match(r3);
console.log(m);
// m should contain 2 matches

This will also give you the ability to retain the regular expression flags from a previous RegExp using the standard RegExp flags.

jsFiddle


I don't quite agree with the "eval" option.

var xxx = /abcd/;
var yyy = /efgh/;
var zzz = new RegExp(eval(xxx)+eval(yyy));

will give "//abcd//efgh//" which is not the intended result.

Using source like

var zzz = new RegExp(xxx.source+yyy.source);

will give "/abcdefgh/" and that is correct.

Logicaly there is no need to EVALUATE, you know your EXPRESSION. You just need its SOURCE or how it is written not necessarely its value. As for the flags, you just need to use the optional argument of RegExp.

In my situation, I do run in the issue of ^ and $ being used in several expression I am trying to concatenate together! Those expressions are grammar filters used accross the program. Now I wan't to use some of them together to handle the case of PREPOSITIONS. I may have to "slice" the sources to remove the starting and ending ^( and/or )$ :) Cheers, Alex.


Problem If the regexp contains back-matching groups like \1.

var r = /(a|b)\1/  // Matches aa, bb but nothing else.
var p = /(c|d)\1/   // Matches cc, dd but nothing else.

Then just contatenating the sources will not work. Indeed, the combination of the two is:

var rp = /(a|b)\1(c|d)\1/
rp.test("aadd") // Returns false

The solution: First we count the number of matching groups in the first regex, Then for each back-matching token in the second, we increment it by the number of matching groups.

function concatenate(r1, r2) {
  var count = function(r, str) {
    return str.match(r).length;
  }
  var numberGroups = /([^\\]|^)(?=\((?!\?:))/g; // Home-made regexp to count groups.
  var offset = count(numberGroups, r1.source);    
  var escapedMatch = /[\\](?:(\d+)|.)/g;        // Home-made regexp for escaped literals, greedy on numbers.
  var r2newSource = r2.source.replace(escapedMatch, function(match, number) { return number?"\\"+(number-0+offset):match; });
  return new RegExp(r1.source+r2newSource,
      (r1.global ? 'g' : '') 
      + (r1.ignoreCase ? 'i' : '')
      + (r1.multiline ? 'm' : ''));
}

Test:

var rp = concatenate(r, p) // returns  /(a|b)\1(c|d)\2/
rp.test("aadd") // Returns true

Providing that:

  • you know what you do in your regexp;
  • you have many regex pieces to form a pattern and they will use same flag;
  • you find it more readable to separate your small pattern chunks into an array;
  • you also want to be able to comment each part for next dev or yourself later;
  • you prefer to visually simplify your regex like /this/g rather than new RegExp('this', 'g');
  • it's ok for you to assemble the regex in an extra step rather than having it in one piece from the start;

Then you may like to write this way:

var regexParts =
    [
        /\b(\d+|null)\b/,// Some comments.
        /\b(true|false)\b/,
        /\b(new|getElementsBy(?:Tag|Class|)Name|arguments|getElementById|if|else|do|null|return|case|default|function|typeof|undefined|instanceof|this|document|window|while|for|switch|in|break|continue|length|var|(?:clear|set)(?:Timeout|Interval))(?=\W)/,
        /(\$|jQuery)/,
        /many more patterns/
    ],
    regexString  = regexParts.map(function(x){return x.source}).join('|'),
    regexPattern = new RegExp(regexString, 'g');

you can then do something like:

string.replace(regexPattern, function()
{
    var m = arguments,
        Class = '';

    switch(true)
    {
        // Numbers and 'null'.
        case (Boolean)(m[1]):
            m = m[1];
            Class = 'number';
            break;

        // True or False.
        case (Boolean)(m[2]):
            m = m[2];
            Class = 'bool';
            break;

        // True or False.
        case (Boolean)(m[3]):
            m = m[3];
            Class = 'keyword';
            break;

        // $ or 'jQuery'.
        case (Boolean)(m[4]):
            m = m[4];
            Class = 'dollar';
            break;

        // More cases...
    }

    return '<span class="' + Class + '">' + m + '</span>';
})

In my particular case (a code-mirror-like editor), it is much easier to perform one big regex, rather than a lot of replaces like following as each time I replace with a html tag to wrap an expression, the next pattern will be harder to target without affecting the html tag itself (and without the good lookbehind that is unfortunately not supported in javascript):

.replace(/(\b\d+|null\b)/g, '<span class="number">$1</span>')
.replace(/(\btrue|false\b)/g, '<span class="bool">$1</span>')
.replace(/\b(new|getElementsBy(?:Tag|Class|)Name|arguments|getElementById|if|else|do|null|return|case|default|function|typeof|undefined|instanceof|this|document|window|while|for|switch|in|break|continue|var|(?:clear|set)(?:Timeout|Interval))(?=\W)/g, '<span class="keyword">$1</span>')
.replace(/\$/g, '<span class="dollar">$</span>')
.replace(/([\[\](){}.:;,+\-?=])/g, '<span class="ponctuation">$1</span>')

It would be preferable to use the literal syntax as often as possible. It's shorter, more legible, and you do not need escape quotes or double-escape backlashes. From "Javascript Patterns", Stoyan Stefanov 2010.

But using New may be the only way to concatenate.

I would avoid eval. Its not safe.


You can concat regex source from both the literal and RegExp class:

var xxx = new RegExp(/abcd/);
var zzz = new RegExp(xxx.source + /efgh/.source);