Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Extracting tokens from a string with regular expressions in .NET

I'm curious if this is even possible with Regex. I want to extract tokens from a string similar to:

Select a [COLOR] and a [SIZE].

Ok, easy enough - I can use (\[[A-Z]+\])

However, I want to also extract the text between the tokens. Basically, I want the matched groups for the above to be:

"Select a "
"[COLOR]"
" and a "
"[SIZE]"
"."

What's the best approach for this? If there's a way to do this with RegEx, that would be great. Otherwise, I'm guessing I have to extract the tokens, then manually loop through the MatchCollection and parse out the substrings based on the indexes and lengths of each Match. Please note I need to preserve the order of the strings and tokens. Is there a better algorithm to do this sort of string parsing?

like image 639
Mike Christensen Avatar asked May 02 '11 08:05

Mike Christensen


1 Answers

Use Regex.Split(s, @"(\[[A-Z]+\])") - it should give you the exact array you're after. Split takes captured groups and converts them to tokens in the result array.

like image 57
Kobi Avatar answered Nov 17 '22 23:11

Kobi