Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

RegEx Match multiple times in string

Tags:

c#

regex

I'm trying to extract values from a string which are between << and >>. But they could happen multiple times.

Can anyone help with the regular expression to match these;

this is a test for <<bob>> who like <<books>> test 2 <<frank>> likes nothing test 3 <<what>> <<on>> <<earth>> <<this>> <<is>> <<too>> <<much>>. 

I then want to foreach the GroupCollection to get all the values.

Any help greatly received. Thanks.

like image 910
Mike Mengell Avatar asked Feb 03 '11 22:02

Mike Mengell


2 Answers

Use a positive look ahead and look behind assertion to match the angle brackets, use .*? to match the shortest possible sequence of characters between those brackets. Find all values by iterating the MatchCollection returned by the Matches() method.

Regex regex = new Regex("(?<=<<).*?(?=>>)");  foreach (Match match in regex.Matches(     "this is a test for <<bob>> who like <<books>>")) {     Console.WriteLine(match.Value); } 

LiveDemo in DotNetFiddle

like image 112
heijp06 Avatar answered Sep 28 '22 01:09

heijp06


While Peter's answer is a good example of using lookarounds for left and right hand context checking, I'd like to also add a LINQ (lambda) way to access matches/groups and show the use of simple numeric capturing groups that come handy when you want to extract only a part of the pattern:

using System.Linq; using System.Collections.Generic; using System.Text.RegularExpressions;  // ...  var results = Regex.Matches(s, @"<<(.*?)>>", RegexOptions.Singleline)             .Cast<Match>()             .Select(x => x.Groups[1].Value); 

Same approach with Peter's compiled regex where the whole match value is accessed via Match.Value:

var results = regex.Matches(s).Cast<Match>().Select(x => x.Value); 

Note:

  • <<(.*?)>> is a regex matching <<, then capturing any 0 or more chars as few as possible (due to the non-greedy *? quantifier) into Group 1 and then matching >>
  • RegexOptions.Singleline makes . match newline (LF) chars, too (it does not match them by default)
  • Cast<Match>() casts the match collection to a IEnumerable<Match> that you may further access using a lambda
  • Select(x => x.Groups[1].Value) only returns the Group 1 value from the current x match object
  • Note you may further create a list of array of obtained values by adding .ToList() or .ToArray() after Select.

In the demo C# code, string.Join(", ", results) generates a comma-separated string of the Group 1 values:

var strs = new List<string> { "this is a test for <<bob>> who like <<books>>",                               "test 2 <<frank>> likes nothing",                               "test 3 <<what>> <<on>> <<earth>> <<this>> <<is>> <<too>> <<much>>." }; foreach (var s in strs)  {     var results = Regex.Matches(s, @"<<(.*?)>>", RegexOptions.Singleline)             .Cast<Match>()             .Select(x => x.Groups[1].Value);     Console.WriteLine(string.Join(", ", results)); } 

Output:

bob, books frank what, on, earth, this, is, too, much 
like image 21
Wiktor Stribiżew Avatar answered Sep 28 '22 02:09

Wiktor Stribiżew