Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regex index in matching string where the match failed

Tags:

c#

regex

I am wondering if it is possible to extract the index position in a given string where a Regex failed when trying to match it?

For example, if my regex was "abc" and I tried to match that with "abd" the match would fail at index 2.

Edit for clarification. The reason I need this is to allow me to simplify the parsing component of my application. The application is an Assmebly language teaching tool which allows students to write, compile, and execute assembly like programs.

Currently I have a tokenizer class which converts input strings into Tokens using regex's. This works very well. For example:

The tokenizer would produce the following tokens given the following input = "INP :x:"
:

Token.OPCODE, Token.WHITESPACE, Token.LABEL, Token.EOL

These tokens are then analysed to ensure they conform to a syntax for a given statement. Currently this is done using IF statements and is proving cumbersome. The upside of this approach is that I can provide detailed error messages. I.E

if(token[2] != Token.LABEL) { throw new SyntaxError("Expected label");}

I want to use a regular expression to define a syntax instead of the annoying IF statements. But in doing so I lose the ability to return detailed error reports. I therefore would at least like to inform the user of WHERE the error occurred.

like image 720
Richard Walton Avatar asked Sep 20 '08 06:09

Richard Walton


2 Answers

I agree with Colin Younger, I don't think it is possible with the existing Regex class. However, I think it is doable if you are willing to sweat a little:

  1. Get the Regex class source code (e.g. http://www.codeplex.com/NetMassDownloader to download the .Net source).
  2. Change the code to have a readonly property with the failure index.
  3. Make sure your code uses that Regex rather than Microsoft's.
like image 115
torial Avatar answered Oct 16 '22 14:10

torial


I guess such an index would only have meaning in some simple case, like in your example.

If you'll take a regex like "ab*c*z" (where by * I mean any character) and a string "abbbcbbcdd", what should be the index, you are talking about? It will depend on the algorithm used for mathcing... Could fail on "abbbc..." or on "abbbcbbc..."

like image 44
Max Galkin Avatar answered Oct 16 '22 14:10

Max Galkin