Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Match regex at exact offset

Tags:

c#

.net

regex

I want to check if a certain pattern (eg. a double quoted string) matches at an exact position.

Example

string text = "aaabbb";
Regex regex = new Regex("b+");
// Now match regex at exactly char 3 (offset) of text

I'd like to check if regex matches at exactly char 3.
I had a look at the Regex.Match Method (String, Int32) but it does not behave like I expected.
So I did some tests and some workarounds:

public void RegexTest2()
{
    Match m;
    string text = "aaabbb";
    int offset = 3;

    m = new Regex("^a+").Match(text, 0); // lets do a sanity check first
    Assert.AreEqual(true, m.Success);
    Assert.AreEqual("aaa", m.Value);  // works as expected

    m = new Regex("^b+").Match(text, offset);
    Assert.AreEqual(false, m.Success);  // this is quite strange...

    m = new Regex("^.{"+offset+"}(b+)").Match(text); // works, but is not very 'nice'
    Assert.AreEqual(true, m.Success);
    Assert.AreEqual("bbb", m.Groups[1].Value);

    m = new Regex("^b+").Match(text.Substring(offset)); // works too, but 
    Assert.AreEqual(true, m.Success);
    Assert.AreEqual("bbb", m.Value);
}

In fact I'm starting to believe that new Regex("^.", 1).Match(myString) will never match anything.

Any suggestions?

Edit:

I got a working solution (workaround). So my question is all about speed and a nice implementation.

like image 779
Simon Ottenhaus Avatar asked Feb 13 '11 10:02

Simon Ottenhaus


People also ask

What does regex 0 * 1 * 0 * 1 * Mean?

Basically (0+1)* mathes any sequence of ones and zeroes. So, in your example (0+1)*1(0+1)* should match any sequence that has 1. It would not match 000 , but it would match 010 , 1 , 111 etc. (0+1) means 0 OR 1.

What does (? I do in regex?

(? i) makes the regex case insensitive. (? c) makes the regex case sensitive.

How do I match a pattern in regex?

To match a character having special meaning in regex, you need to use a escape sequence prefix with a backslash ( \ ). E.g., \. matches "." ; regex \+ matches "+" ; and regex \( matches "(" . You also need to use regex \\ to match "\" (back-slash).

What does \\ mean in regex?

\\. matches the literal character . . the first backslash is interpreted as an escape character by the Emacs string reader, which combined with the second backslash, inserts a literal backslash character into the string being read. the regular expression engine receives the string \. html?\ ' .


1 Answers

Have you tried what the docs say?

If you want to restrict a match so that it begins at a particular character position in the string and the regular expression engine does not scan the remainder of the string for a match, anchor the regular expression with a \G (at the left for a left-to-right pattern, or at the right for a right-to-left pattern). This restricts the match so it must start exactly at startat.

i.e. replace the ^ with a \G:

m = new Regex(@"\\Gb+").Match(text, offset);
Assert.AreEqual(true, m.Success);  // should now work
like image 102
CAFxX Avatar answered Sep 22 '22 05:09

CAFxX