Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regex headache

Tags:

.net

regex

I want to validate a some C# source code for a scripting engine. I want to make sure that only System.Math class members may be referenced. I am trying to create a regular expression that will match a dot, followed by a capital letter, followed by any number of word characters, ending at a word boundry that is NOT preceded by System.Math.

I started with this:

(?<!Math)\.[A-Z]+[\w]*

Which works fine for:

return Math.Max(466.89/83.449 * 5.5);  // won’t flag this
return Xath.Max(466.89/83.449 * 5.5);  // will flag this

It correctly matches .Max when it is not preceded by Math. However, now that I'm trying to expand the regular expression to include System, I can't get it to work.

I've tried these permutations of the regular expression and more:

((?<!System\.Math)\.[A-Z]+[\w]*)
((?<!(?<!System)\.Math)\.[A-Z]+[\w]*)
((?<!System)\.(?<!Math)\.[A-Z]+[\w]*)
((?<!System)|(?<!Math)\.[A-Z]+[\w]*)
((?<!System\.Math)|(?<!Math)\.[A-Z]+[\w]*)

Using these statements:

return System.Math.Max(466.89/83.449 * 5.5);
return System.Xath.Max(466.89/83.449 * 5.5);
return Xystem.Math.Max(466.89/83.449 * 5.5);

I've tried everything that I could think of, but it either ALWAYS matches the second element (.Math or .Xath above) or it DOESN'T match ANYTHING.

If anyone would have have mercy on me and point out what I'm doing wrong, I would greatly appreaciate it.

Thanks in advance, Welton

like image 616
Welton v3.61 Avatar asked Aug 13 '10 00:08

Welton v3.61


Video Answer


2 Answers

The trick is to make sure you never start matching a member name anywhere but at the beginning. Then it's a simple matter of using a lookahead to find out if whatever you're looking at starts with System.Math.. Try this regex:

(?<![\w.])(?!(?:System\.)?Math\.)(?:[A-Z]\w*\.)+[A-Z]\w*\b

The lookbehind ensures that the match doesn't start in the middle of a word (\w) or the middle of a qualified member name (.). Now, if the lookahead fails it can't just jump to the beginning of the next component (e.g, the Math. in System.Math.) and try again. It's all or nothing.

However, this will match Math.Max if it's not preceded by System.. Do you really need that, or was that just an intermediate step in developing a regex for the full name?

EDIT: I went ahead and made the System. part optional.

like image 93
Alan Moore Avatar answered Nov 29 '22 06:11

Alan Moore


If you are just looking for what you stated in the example, this regex will do it.

^[\w\s]*?[A-Z]\w+\.[A-Z]\w+\.(?<!System\.Math\.)

It matches all calls to something OTHER than System.Math.XXX as long as: a) there are two . in the call, b) that call is on one line.

return System.Math.Max(466.89/83.449 * 5.5); // no match
return System.Xath.Max(466.89/83.449 * 5.5); // match
return Xystem.Math.Max(466.89/83.449 * 5.5); // match
System.Math.Max(466.89/83.449 * 5.5);  // no match
System.Xath.Max(466.89/83.449 * 5.5);  // match
Xystem.Math.Max(466.89/83.449 * 5.5);  // match
return System.Math.Max(466.89/83.449 * 5.5); // no match
return System.Xath.Max(466.89/83.449 * 5.5); // match
return Xystem.Math.Max(466.89/83.449 * 5.5); // match
Math.Max(466.89/83.449 * 5.5);               // no match - only one '.'
System.Max.Math(466.89/83.449 * 5.5);        // match

I agree with the comments though; Any regex is pretty fragile and should only be thought of as a text editor type help. You need a parser if you wish it to be bullet proof.

like image 42
dawg Avatar answered Nov 29 '22 08:11

dawg