I need to run a C# RegEx match on a string. Problem is, I'm looking for more than one pattern on a single string, and I cannot find a way to do that with a single run.
For example, in the string
The dog has jumped
I'm looking for "dog" and for "dog has".
I don't know how can I get those two results with one pass.
I've tried to concatenate the pattern with the alteration symbol (|), like that:
(dog|dog has)
But it returned only the first match.
What can I use to get back both the matches?
Thanks!
The regex engine will return the first substring that satisfied the pattern. If you write (dog|dog has)
, it won't ever be able to match dog has
because dog has
starts with dog
, which is the first alternative. Furthermore, the regex engine won't return overlapping matches.
Here's a convoluted method:
var patterns = new[] { "dog", "dog has" };
var sb = new StringBuilder();
for (var i = 0; i < patterns.Length; i++)
sb.Append(@"(?=(?<p").Append(i).Append(">").Append(patterns[i]).Append("))?");
var regex = new Regex(sb.ToString(), RegexOptions.Compiled);
Console.WriteLine("Pattern: {0}", regex);
var input = "a dog has been seen with another dog";
Console.WriteLine("Input: {0}", input);
foreach (var match in regex.Matches(input).Cast<Match>())
{
for (var i = 0; i < patterns.Length; i++)
{
var group = match.Groups["p" + i];
if (!group.Success)
continue;
Console.WriteLine("Matched pattern #{0}: '{1}' at index {2}", i, group.Value, group.Index);
}
}
This produces the following output:
Pattern: (?=(?<p0>dog))?(?=(?<p1>dog has))?
Input: a dog has been seen with another dog
Matched pattern #0: 'dog' at index 2
Matched pattern #1: 'dog has' at index 2
Matched pattern #0: 'dog' at index 33
Yes, this is an abuse of the regex engine :)
This works by building a pattern using optional lookaheads, which capture the substrings as a side effect, but the pattern otherwise always matches an empty string. So there are n+1
total matches, n
being the input length. The patterns cannot contain numbered backreferences, but you can use named backreferences instead.
Also, this can return overlapping matches, as it will try to match all patterns at all string positions.
But you definitely should benchmark this against a manual approach (looping over the patterns and matching each of them separately). I don't expect this to be fast...
You can use one regex pattern to do both.
Pattern: (dog\b has\b)|(dog\b)
I figured out this pattern using the online builder here: enter link description here
Then you can use it in C# with the regex class by doing something like
Regex reg = new Regex("(dog\b has\b)|(dog\b)", RegexOptions.IgnoreCase);
if (reg.IsMatch){
//found dog or dog has
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With