Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regular Expressions: Match up to an optional word

Tags:

c#

regex

Need to match the first part of a sentence, up to a given word. However, that word is optional, in which case I want to match the whole sentence. For example:

I have a sentence with a clause I don't want.

I have a sentence and I like it.

In the first case, I want "I have a sentence". In the second case, I want "I have a sentence and I like it."

Lookarounds will give me the first case, but as soon as I try to make it optional, to cover the second case, I get the whole first sentence. I've tried making the expression lazy... no dice.

The code that works for the first case:

var regEx = new Regex(@".*(?=with)");
string matchstr = @"I have a sentence with a clause I don't want";

if (regEx.IsMatch(matchstr)) {
    Console.WriteLine(regEx.Match(matchstr).Captures[0].Value);
    Console.WriteLine("Matched!");
}
else {
    Console.WriteLine("Not Matched : (");
}

The expression that I wish worked:

var regEx = new Regex(@".*(?=with)?");

Any suggestions?

like image 563
James King Avatar asked Aug 27 '10 16:08

James King


People also ask

How do you match a word in regex?

To run a “whole words only” search using a regular expression, simply place the word between two word boundaries, as we did with ‹ \bcat\b ›. The first ‹ \b › requires the ‹ c › to occur at the very start of the string, or after a nonword character.

What does '$' mean in regex?

$ means "Match the end of the string" (the position after the last character in the string).

What do you use in a regular expression to match any 1 character or space?

Use square brackets [] to match any characters in a set. Use \w to match any single alphanumeric character: 0-9 , a-z , A-Z , and _ (underscore). Use \d to match any single digit. Use \s to match any single whitespace character.


1 Answers

There are several ways to do this. You could do something like this:

^(.*?)(with|$)

The first group is matched reluctantly, i.e. as few characters as possible. We have an overall match if this group is followed by either with or the end of the line $ anchor.

Given this input:

I have a sentence with a clause I don't want.
I have a sentence and I like it.

Then there are two matches (as seen on rubular.com):

  • Match 1:
    • Group 1: "I have a sentence "
    • Group 2: "with"
  • Match 2:
    • Group 1: "I have a sentence and I like it".
    • Group 2: "" (empty string)

You can make the grouped alternation non-capturing with (?:with|$) if you don't need to distinguish the two cases.

Related questions

  • Difference between .*? and .* for regex
like image 58
polygenelubricants Avatar answered Sep 20 '22 04:09

polygenelubricants