Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is this a bug in .NET's Regex.Split?

I have two regular expressions, for use with Regex.Split:

(?<=\G[^,],[^,],)

and

(?<=\G([^,],){2})

When splitting the string "A,B,C,D,E,F,G,", the first one results in:

A,B, 
C,D, 
E,F, 
G, 

and the second results in:

A,B, 
A, 
C,D, 
C, 
E,F, 
E, 
G, 

What is going on here? I thought that (X){2} was always equivalent to XX, but I'm not sure anymore. In my actual problem, I need to do something like quite a bit more complex, and I need to do it sixty nine times, so just repeating the pattern is less than ideal.

like image 943
John Gietzen Avatar asked Oct 10 '13 02:10

John Gietzen


2 Answers

From the documentation for Regex.Split

If capturing parentheses are used in a Regex.Split expression, any captured text is included in the resulting string array.

The internal parentheses are capturing. Try using (?:[^,],) instead.

like image 57
Explosion Pills Avatar answered Sep 28 '22 03:09

Explosion Pills


From docs:

If capturing parentheses are used in a Regex.Split expression, any captured text is included in the resulting string array.

You have a capture group in your second expression. Try non-capturing parens:

(?<=\G(?:[^,],){2})
like image 25
Amadan Avatar answered Sep 28 '22 01:09

Amadan