I'm doing some regular expression gymnastics. I set myself the task of trying to search for C# code where there is a usage of the as-operator not followed by a null-check within a reasonable amount of space. Now I don't want to parse the C# code. E.g. I want to capture code snippets such as
var x1 = x as SimpleRes;
var y1 = y as SimpleRes;
if(x1.a == y1.a)
however, not capture
var x1 = x as SimpleRes;
var y1 = y as SimpleRes;
if(x1 == null)
nor for that matter
var x1 = x as SimpleRes;
var y1 = y as SimpleRes;
if(somethingunrelated == null) {...}
if(x1.a == y1.a)
Thus any random null-check will count as a "good check" and hence not found.
The question is: How do I match something while ensuring something else is not found in its sourroundings.
I've tried the naive approach, looking for 'as' then doing a negative lookahead within a 150 characters.
\bas\b.{1,150}(?!\b==\s*null\b)
The above regular expression matches all of the above examples infortunately. My gut tells me, the problem is that the looking ahead and then doing negative lookahead can find many situations where the lookahead does not find the '== null'.
If I try negating the whole expression, then that doesn't help either, at that would match most C# code around.
I love regex gymnastics! Here is a commented PHP regex:
$re = '/# Find all AS, (but not preceding a XX == null).
\bas\b # Match "as"
(?= # But only if...
(?: # there exist from 1-150
[\S\s] # chars, each of which
(?!==\s*null) # are NOT preceding "=NULL"
){1,150}? # (and do this lazily)
(?: # We are done when either
(?= # we have reached
==\s*(?!null) # a non NULL conditional
) #
| $ # or the end of string.
)
)/ix'
And here it is in Javascript style:
re = /\bas\b(?=(?:[\S\s](?!==\s*null)){1,150}?(?:(?===\s*(?!null))|$))/ig;
This one did make my head hurt a little...
Here is the test data I am using:
text = r""" var x1 = x as SimpleRes;
var y1 = y as SimpleRes;
if(x1.a == y1.a)
however, not capture
var x1 = x as SimpleRes;
var y1 = y as SimpleRes;
if(x1 == null)
nor for that matter
var x1 = x as SimpleRes;
var y1 = y as SimpleRes;
if(somethingunrelated == null) {...}
if(x1.a == y1.a)"""
Put the .{1,150}
inside the lookahead, and replace .
with \s\S
(in general, .
doesn't match newlines). Also, the \b
might be misleading near the ==
.
\bas\b(?![\s\S]{1,150}==\s*null\b)
I think it would help to put the variable name into () so you can use it as a back reference. Something like the following,
\b(\w+)\b\W*=\W*\w*\W*\bas\b[\s\S]{1,150}(?!\b\1\b\W*==\W*\bnull\b)
The question isn't clear. What do you want EXACTLY ? I regret, but I still don't understand, after having read the question and comments numerous times.
.
Must the code be in C# ? In Python ? Other ? There is no indication concerning this point
.
Do you want a matching only if a if(... == ...)
line follows a block of var ... = ...
lines ?
Or may an heterogenous line be BETWEEN the block and the if(... == ...)
line without stopping the matching ?
My code takes the second option as true.
.
Does a if(... == null)
line AFTER a if(... == ...)
line stop the matchin or not ?
Unable to understand if it is yes or no, I defined the two regexes to catch these two options.
.
I hope my code will be clear enough and answering to your preoccupation.
It is in Python
import re
ch1 ='''kutgdfxfovuyfuuff
var x1 = x as SimpleRes;
var y1 = y as SimpleRes;
if(x1.a == y1.a)
1618987987849891
'''
ch2 ='''kutgdfxfovuyfuuff
var x1 = x as SimpleRes;
var y1 = y as SimpleRes;
uydtdrdutdutrr
if(x1.a == y1.a)
3213546878'''
ch3='''kutgdfxfovuyfuuff
var x1 = x as SimpleRes;
var y1 = y as SimpleRes;
if(x1 == null)
165478964654456454'''
ch4='''kutgdfxfovuyfuuff
var x1 = x as SimpleRes;
var y1 = y as SimpleRes;
hgyrtdduihudgug
if(x1 == null)
165489746+54646544'''
ch5='''kutgdfxfovuyfuuff
var x1 = x as SimpleRes;
var y1 = y as SimpleRes;
if(somethingunrelated == null ) {...}
if(x1.a == y1.a)
1354687897'''
ch6='''kutgdfxfovuyfuuff
var x1 = x as SimpleRes;
var y1 = y as SimpleRes;
ifughobviudyhogiuvyhoiuhoiv
if(somethingunrelated == null ) {...}
if(x1.a == y1.a)
2468748874897498749874897'''
ch7 = '''kutgdfxfovuyfuuff
var x1 = x as SimpleRes;
var y1 = y as SimpleRes;
if(x1.a == y1.a)
iufxresguygo
liygcygfuihoiuguyg
if(somethingunrelated == null ) {...}
oufxsyrtuy
'''
ch8 = '''kutgdfxfovuyfuuff
var x1 = x as SimpleRes;
var y1 = y as SimpleRes;
tfsezfuytfyfy
if(x1.a == y1.a)
iufxresguygo
liygcygfuihoiuguyg
if(somethingunrelated == null ) {...}
oufxsyrtuy
'''
ch9 = '''kutgdfxfovuyfuuff
var x1 = x as SimpleRes;
var y1 = y as SimpleRes;
tfsezfuytfyfy
if(x1.a == y1.a)
if(somethingunrelated == null ) {...}
oufxsyrtuy
'''
pat1 = re.compile(('('
'(^var +\S+ *= *\S+ +as .+[\r\n]+)+?'
'([\s\S](?!==\s*null\\b))*?'
'^if *\( *[^\s=]+ *==(?!\s*null).+$'
')'
),
re.MULTILINE)
pat2 = re.compile(('('
'(^var +\S+ *= *\S+ +as .+[\r\n]+)+?'
'([\s\S](?!==\s*null\\b))*?'
'^if *\( *[^\s=]+ *==(?!\s*null).+$'
')'
'(?![\s\S]{0,150}==)'
),
re.MULTILINE)
for ch in (ch1,ch2,ch3,ch4,ch5,ch6,ch7,ch8,ch9):
print pat1.search(ch).group() if pat1.search(ch) else pat1.search(ch)
print
print pat2.search(ch).group() if pat2.search(ch) else pat2.search(ch)
print '-----------------------------------------'
Result
>>>
var x1 = x as SimpleRes;
var y1 = y as SimpleRes;
if(x1.a == y1.a)
var x1 = x as SimpleRes;
var y1 = y as SimpleRes;
if(x1.a == y1.a)
-----------------------------------------
var x1 = x as SimpleRes;
var y1 = y as SimpleRes;
uydtdrdutdutrr
if(x1.a == y1.a)
var x1 = x as SimpleRes;
var y1 = y as SimpleRes;
uydtdrdutdutrr
if(x1.a == y1.a)
-----------------------------------------
None
None
-----------------------------------------
None
None
-----------------------------------------
None
None
-----------------------------------------
None
None
-----------------------------------------
var x1 = x as SimpleRes;
var y1 = y as SimpleRes;
if(x1.a == y1.a)
None
-----------------------------------------
var x1 = x as SimpleRes;
var y1 = y as SimpleRes;
tfsezfuytfyfy
if(x1.a == y1.a)
None
-----------------------------------------
var x1 = x as SimpleRes;
var y1 = y as SimpleRes;
tfsezfuytfyfy
if(x1.a == y1.a)
None
-----------------------------------------
>>>
Let me try to redefine your problem:
if (... == null)
within 150 characters, don't matchif (... == null)
within 150 characters, matchYour expression \bas\b.{1,150}(?!\b==\s*null\b)
won't work because of the negative look-ahead. The regex can always skip ahead or behind one letter in order to avoid this negative look-ahead and you end up matching even when there is an if (... == null)
there.
Regex's are really not good at not matching something. In this case, you're better of trying to match an "as" assignment with an "if == null" check within 150 characters:
\bas\b.{1,150}\b==\s*null\b
and then negating the check: if (!regex.match(text)) ...
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With