Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Preventing duplicate matches in RegEx

The following code

string expression = "(\\{[0-9]+\\})";
RegexOptions options = ((RegexOptions.IgnorePatternWhitespace | RegexOptions.Multiline) | RegexOptions.IgnoreCase);
Regex tokenParser = new Regex(expression, options);

MatchCollection matches = tokenParser.Matches("The {0} is a {1} and the {2} is also a {1}");

will match and capture "{0}", "{1}", "{2}" and "{1}".

Is it possible to change it (either the regular expression or option of the RegEx) so that it would match and capture "{0}", "{1}" and "{2}". In other words, each match should only be captured once?

like image 369
Steve Crane Avatar asked Oct 17 '25 21:10

Steve Crane


2 Answers

Here is what I came up with.

private static bool TokensMatch(string t1, string t2)
{
  return TokenString(t1) == TokenString(t2);
}

private static string TokenString(string input)
{
  Regex tokenParser = new Regex(@"(\{[0-9]+\})|(\[.*?\])");

  string[] tokens = tokenParser.Matches(input).Cast<Match>()
      .Select(m => m.Value).Distinct().OrderBy(s => s).ToArray<string>();

  return String.Join(String.Empty, tokens);
}

Note that the difference in the regular expression from the one in my question is due to the fact that I cater for two types of token; numbered ones delimited by {} and named ones delimited by [];

like image 160
Steve Crane Avatar answered Oct 19 '25 09:10

Steve Crane


Regular expressions solve lots of problems, but not every problem. How about using other tools in the toolbox?

var parameters = new HashSet<string>(
    matches.Select(mm => mm.Value).Skip(1));

Or

var parameters = matches.Select(mm => mm.Value).Skip(1).Distinct();
like image 33
user7116 Avatar answered Oct 19 '25 09:10

user7116