Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Javascript RegEx non-capturing prefix

I am trying to do some string replacement with RegEx in Javascript. The scenario is a single line string containing long comma-delimited list of numbers, in which duplicates are possible.

An example string is: 272,2725,2726,272,2727,297,272 (The end may or may not end in a comma)

In this example, I am trying to match each occurrence of the whole number 272. (3 matches expected) The example regex I'm trying to use is: (?:^|,)272(?=$|,)

The problem I am having is that the second and third matches are including the leading comma, which I do not want. I am confused because I thought (?:^|,) would match, but not capture. Can someone shed light on this for me? An interesting bit is that the trailing comma is excluded from the result, which is what I want.

For what it is worth, if I were using C# there is syntax for prefix matching that does what I want: (?<=^|,) However, it appears to be unsupported in JavaScript.

Lastly, I know I could workaround it using string splitting, array manipulation and rejoining, but I want to learn.

like image 687
Derek Bromenshenkel Avatar asked May 04 '11 15:05

Derek Bromenshenkel


2 Answers

Use word boundaries instead:

\b272\b

ensures that only 272 matches, but not 2725.

(?:...) matches and doesn't capture - but whatever it matches will be part of the overall match.

A lookaround assertion like (?=...) is different: It only checks if it is possible (or impossible) to match the enclosed regex at the current point, but it doesn't add to the overall match.

like image 144
Tim Pietzcker Avatar answered Nov 19 '22 05:11

Tim Pietzcker


Here is a way to create a JavaScript look behind that has worked in all cases I needed.

This is an example. One can do many more complex and flexible things.

The main point here is that in some cases, it is possible to create a RegExp non-capturing prefix (look behind) construct in JavaScript .

This example is designed to extract all fields that are surrounded by braces '{...}'. The braces are not returned with the field.

This is just an example to show the idea at work not necessarily a prelude to an application.

    function testGetSingleRepeatedCharacterInBraces()
      {
        var leadingHtmlSpaces = '&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;' ;
        // The '(?:\b|\B(?={))' acts as a prefix non-capturing group.
        // That is, this works (?:\b|\B(?=WhateverYouLike))
        var regex  = /(?:\b|\B(?={))(([0-9a-zA-Z_])\2{4})(?=})/g ;
        var string = '' ;

        string = 'Message has no fields' ;
        document.write( 'String => "' + string 
                                      + '"<br>'  + leadingHtmlSpaces + 'fields => '
                                      + getMatchingFields( string, regex )
                                      + '<br>' ) ;

        string = '{LLLLL}Message {11111}{22222} {ffffff}abc def{EEEEE} {_____} {4444} {666666} {55555}' ;
        document.write( 'String => "' + string
                                      + '"<br>'  + leadingHtmlSpaces + 'fields => '
                                      + getMatchingFields( string, regex )
                                      + '<br>' ) ;
      } ;

    function getMatchingFields( stringToSearch, regex )
      {
         var matches = stringToSearch.match( regex ) ;
         return matches ? matches : [] ;
      } ;

    Output:
    String => "Message has no fields"
         fields =>
    String => "{LLLLL}Message {11111}{22222} {ffffff}abc def{EEEEE} {_____} {4444} {666666} {55555}"
         fields => LLLLL,11111,22222,EEEEE,_____,55555
like image 2
user1895776 Avatar answered Nov 19 '22 04:11

user1895776