Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

c# regular expression match at specific index in string?

Tags:

c#

regex

I'd like to test if a regex will match part of a string at a specific index (and only starting at that specific index). For example, given the string "one two 3 4 five", I'd like to know that, at index 8, the regular expression [0-9]+ will match "3". RegularExpression.IsMatch and Match both take a starting index, however they both will search the entire rest of the string for a match if necessary.

string text="one two 3 4 five";
Regex num=new Regex("[0-9]+");

//unfortunately num.IsMatch(text,0) also finds a match and returns true
Console.WriteLine("{0} {1}",num.IsMatch(text, 8),num.IsMatch(text,0));

Obviously, I could check if the resulting match starts at the index I am interested in, but I will be doing this a large number of times on large strings, so I don't want to waste time searching for matches later on in the string. Also, I won't know in advance what regular expressions I will actually be testing against the string.

I don't want to:

  1. split the string on some boundary like whitespace because in my situation I won't know in advance what a suitable boundary would be
  2. have to modify the input string in any way (like getting the substring at index 8 and then using ^ in the regex)
  3. search the rest of the string for a match or do anything else that wouldn't be performant for a large number of tests against a large string.

I would like to parse a potentially large user supplied body of text using an arbitrary user supplied grammar. The grammar will be defined in a BNF or PEG like syntax, and the terminals will either be string literals or regular expressions. Thus I will need to check if the next part of the string matches any of the potential terminals as driven by the grammar.

like image 979
Rngbus Avatar asked Aug 11 '09 20:08

Rngbus


2 Answers

How about using Regex.IsMatch(string, int) using a regular expression starting with \G (meaning "start of last match")?

That appears to work:

using System;
using System.Text.RegularExpressions;

class Test
{
    static void Main()
    {
        string text="one two 3 4 five";
        Regex num=new Regex(@"\G[0-9]+");

        Console.WriteLine("{0} {1}",
                          num.IsMatch(text, 8), // True
                          num.IsMatch(text, 0)); // False
    }
}
like image 57
Jon Skeet Avatar answered Sep 28 '22 05:09

Jon Skeet


If you only want to search a substring of the text, grab that substring before the regex.

myRegex.Match(myString.Substring(8, 10));
like image 37
Rob Elliott Avatar answered Sep 28 '22 04:09

Rob Elliott