Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regex reads from right to left

I was looking for a short code that can put commas in set of numbers until I came to this site.

The code:

function addCommas(nStr)
{
    nStr += '';
    x = nStr.split('.');
    x1 = x[0];
    x2 = x.length > 1 ? '.' + x[1] : '';
    var rgx = /(\d+)(\d{3})/;
    while (rgx.test(x1)) {
        x1 = x1.replace(rgx, '$1' + ',' + '$2');
    }
    return x1 + x2;
}  

Works really great. Having this example set of number:

addCommas('83475934.89');  

Will return "83,475,934.89", but when I read the code, I expect it to return 8,3,4,7,5,934.89 but this sites explains that

\d+ in combination with \d{3} will match a group of 3 numbers preceded by any amount of numbers. This tricks the search into replacing from right to left.

And I get so confused. How does this code read from right to left? Plus, what does $1 and $2 mean?

like image 324
fiberOptics Avatar asked May 30 '13 12:05

fiberOptics


3 Answers

It isn't actually reading right-to-left. What's really happening is that it's repeatedly applying the (\d+)(\d{3}) pattern (via a while loop) and replacing until it no longer matches the pattern. In other words:

Iteration 1:

x1 = 83475934.89
x1.replace((\d+)(\d{3}), '$1' + ',' + '$2');
x1 = 83475,934.89

Iteration 2:

x1 = 83475,934.89
x1.replace((\d+)(\d{3}), '$1' + ',' + '$2');
x1 = 83,475,934.89

Iteration 3:

x1 = 83,475,934.89
x1.replace((\d+)(\d{3}), '$1' + ',' + '$2');
// no match; end loop

Edit:

Plus, what does $1 and $2 mean?

Those are back references to the matching groups (\d+) and (\d{3}) respectively.

Here's a great reference for learning how Regular Expressions actually work:
http://www.regular-expressions.info/quickstart.html

like image 92
Brian Lacy Avatar answered Nov 11 '22 08:11

Brian Lacy


It matches from right to left because it uses greedy pattern matching. This means that it first finds all the digits (the \d+), then tries to find the \d{3}. In the number 2421567.56, for example it would first match the digits up until the '.' - 2431567 - then works backwards to match the next 3 digits (567) in the next part of the regex. It does this in a loop adding a comma between the $1 and $2 variables.

The $'s represent matching groups formed in the regex with parentheses e.g. the (\d+) = $1 and (\d{3}) = $2. In this way it can easily add characters between them.

In the next iteration, the greedy matching stops at the newly created comma instead, and it continues until it can't match > 3 digits.

like image 31
Utopia Avatar answered Nov 11 '22 08:11

Utopia


I wrote a regular expression which does the same thing in a single pass:

/(?!\b)(\d{3}(?=(\d{3})*\b))/g

Try this for example with varying numbers at the start:

var num = '1234567890123456';

for(var i = 1; i <= num.length; i++)
{
  console.log(num.slice(0, -i).replace(/(?!\b)(\d{3}(?=(\d{3})*\b))/g, ',$1'));
}

I'll try to break it down here:

Ignore this bit for now - I'll come back to that.

(?!\b)(\d{3}(?=(\d{3})*\b))


It still reads from left to right trying to capture blocks of 3 digits. Here's the capturing group.

(?!\b)(\d{3}(?=(\d{3})*\b))


However, inside the capturing group, it uses a lookahead.

(?!\b)(\d{3}(?=(\d{3})*\b))


The lookahead looks for any multiple of 3 digits anchored to the end of the number - the terminating boundary. This aligns the capture to multiples of 3 from the right-hand end of the number. This means it works with decimal numbers too (unless they are more than 3 decimal places, in which case it will put commas in them too. It ain't perfect).

(?!\b)(\d{3}(?=(\d{3})*\b))


The problem I had was that JavaScript doesn't support atomic look-behinds so, when the number has a multiple of 3 digits, it was matching the first 3 digits and placing a comma at the very start of the number.
You can't match a character before the 3 digit match without throwing off the repetition, so I had to use a negative lookahead that matches a word-boundary. It's kinda the opposite of putting ^ at the start.

(?!\b)(\d{3}(?=(\d{3})*$))


Essentially it prevents the expression from matching from the start of the string.
Which would be bad.

like image 4
farmer-Bri Avatar answered Nov 11 '22 09:11

farmer-Bri