Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Grouping string by comma between brackets

Tags:

c#

regex

Response to : Regular Expression to find a string included between two characters while EXCLUDING the delimiters

Hi,I'm looking for a regex pattern that applies to my string including brackets:

[1,2,3,4,5] [abc,ef,g] [0,2,4b,y7]
could be anything including word,digit,non-word together or separated.

I wish to get the group between brackets by \[(.*?)\] but what is the regex pattern that will give me the group between brackets and sub-group strings separated by commas so that the result may be following ??

Group1 : 1,2,3,4,5
 Group1: 1
 Group2: 2
 Group3: 3
 Group4: 4
 Group5: 5

Group2 : abc,ef,g
 Group1: abc
 Group2: ef
 Group3: g

etc ..

Thank you for your help

like image 713
Myra Avatar asked Dec 17 '22 01:12

Myra


2 Answers

I agree with @Dav that you would be best using String.Split on each square-bracketed group.

However, you can extract all the data using a single regular expression:

(?:\s*\[((.*?)(?:,(.+?))*)\])+

Using this expression, you will have to process all the captures of each group to get all the data. As an example, run the following code on your string:

var regex = new Regex(@"(?:\s*\[((.*?)(?:,(.+?))*)\])+");
var match = regex.Match(@"[1,2,3,4,5] [abc,ef,g] [0,2,4b,y7]");

for (var i = 1; i < match.Groups.Count; i++)
{
    var group = match.Groups[i];
    Console.WriteLine("Group " + i);

    for (var j = 0; j < group.Captures.Count; j++)
    {
        var capture = group.Captures[j];

        Console.WriteLine("  Capture " + j + ": " + capture.Value 
                                       + " at " + capture.Index);
    }
}

This produces the following output:

Group 1
  Capture 0: 1,2,3,4,5 at 1
  Capture 1: abc,ef,g at 13
  Capture 2: 0,2,4b,y7 at 24
Group 2
  Capture 0: 1 at 1
  Capture 1: abc at 13
  Capture 2: 0 at 24
Group 3
  Capture 0: 2 at 3
  Capture 1: 3 at 5
  Capture 2: 4 at 7
  Capture 3: 5 at 9
  Capture 4: ef at 17
  Capture 5: g at 20
  Capture 6: 2 at 26
  Capture 7: 4b at 28
  Capture 8: y7 at 31

Group 1 gives you the value of each square-bracketed group, group 2 gives you the first item matched in each square-bracketed group and group 3 gives you all the subsequent items. You will have to look at the indexes of the captures to determine which item belongs to each square-bracketed group.

like image 155
Phil Ross Avatar answered Dec 24 '22 01:12

Phil Ross


Here's another option that uses CaptureCollections (the only way to do this in a single regex). Where Phil Ross's answer does it all in one match operation, this one does multiple matches. This way, all the individual-item captures are properly grouped according to the bracket pairs where they were found.

string s = @"[1,2,3,4,5] [abc,ef,g] [0,2,4b,y7] ";
Regex r = new Regex(@"\[((?:([^,\[\]]+),?)*)\]");
int matchNum = 0;
foreach (Match m in r.Matches(s))
{
  Console.WriteLine("Match {0}, Group 1: {1}", ++matchNum, m.Groups[1]);
  int captureNum = 0;
  foreach (Capture c in m.Groups[2].Captures)
  {
    Console.WriteLine("  Group 2, Capture {0}: {1}", ++captureNum, c);
  }
}

output:

Match 1, Group 1: 1,2,3,4,5
  Group 2, Capture 1: 1
  Group 2, Capture 2: 2
  Group 2, Capture 3: 3
  Group 2, Capture 4: 4
  Group 2, Capture 5: 5
Match 2, Group 1: abc,ef,g
  Group 2, Capture 1: abc
  Group 2, Capture 2: ef
  Group 2, Capture 3: g
Match 3, Group 1: 0,2,4b,y7
  Group 2, Capture 1: 0
  Group 2, Capture 2: 2
  Group 2, Capture 3: 4b
  Group 2, Capture 4: y7
like image 21
Alan Moore Avatar answered Dec 24 '22 00:12

Alan Moore