Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Determining which pattern matched using Regex.Matches

Tags:

c#

regex

I'm writing a translator, not as any serious project, just for fun and to become a bit more familiar with regular expressions. From the code below I think you can work out where I'm going with this (cheezburger anyone?).

I'm using a dictionary which uses a list of regular expressions as the keys and the dictionary value is a List<string> which contains a further list of replacement values. If I'm going to do it this way, in order to work out what the substitute is, I obviously need to know what the key is, how can I work out which pattern triggered the match?

        var dictionary = new Dictionary<string, List<string>>
        {                     
            {"(?!e)ight", new List<string>(){"ite"}},
            {"(?!ues)tion", new List<string>(){"shun"}},
            {"(?:god|allah|buddah?|diety)", new List<string>(){"ceiling cat"}},
            ..
        }

        var regex = "(" + String.Join(")|(", dictionary.Keys.ToArray()) + ")";

        foreach (Match metamatch in Regex.Matches(input
           , regex
           , RegexOptions.IgnoreCase | RegexOptions.ExplicitCapture))
        {
            substitute = GetRandomReplacement(dictionary[ ????? ]);
            input = input.Replace(metamatch.Value, substitute);
        }

Is what I'm attempting possible, or is there a better way to achieve this insanity?

like image 324
Andrew Avatar asked Jun 24 '10 16:06

Andrew


People also ask

How do I match a pattern in regex?

Most characters, including all letters ( a-z and A-Z ) and digits ( 0-9 ), match itself. For example, the regex x matches substring "x" ; z matches "z" ; and 9 matches "9" . Non-alphanumeric characters without special meaning in regex also matches itself. For example, = matches "=" ; @ matches "@" .

Which pattern is used to match any non What character?

\W matches any non-word character. Non-word characters include characters other than alphanumeric characters ( - , - and - ) and underscore (_).

How do you know if a string matches a pattern?

To check if a String matches a Pattern one should perform the following steps: Compile a String regular expression to a Pattern, using compile(String regex) API method of Pattern. Use matcher(CharSequence input) API method of Pattern to create a Matcher that will match the given String input against this pattern.

Does regex match anything?

Matching a Single Character Using RegexThe matched character can be an alphabet, a number or, any special character. To create more meaningful patterns, we can combine the dot character with other regular expression constructs. Matches only a single character.


2 Answers

You can name each capture group in a regular expression and then query the value of each named group in your match. This should allow you to do what you want.

For example, using the regular expression below,

(?<Group1>(?!e))ight

you can then extract the group matches from your match result:

match.Groups["Group1"].Captures
like image 151
Jeff Yates Avatar answered Sep 21 '22 09:09

Jeff Yates


You've got another problem. Check this out:

string s = @"My weight is slight.";
Regex r = new Regex(@"(?<!e)ight\b");
foreach (Match m in r.Matches(s))
{
  s = s.Replace(m.Value, "ite");
}
Console.WriteLine(s);

output:

My weite is slite.

String.Replace is a global operation, so even though weight doesn't match the regex, it gets changed anyway when slight is found. You need to do the match, lookup, and replace at the same time; Regex.Replace(String, MatchEvaluator) will let you do that.

like image 30
Alan Moore Avatar answered Sep 24 '22 09:09

Alan Moore