I need to build a regular expression that finds the word "int" only if it's not part of some string.
I want to find whether int is used in the code. (not in some string, only in regular code)
Example:
int i; // the regex should find this one.
String example = "int i"; // the regex should ignore this line.
logger.i("int"); // the regex should ignore this line.
logger.i("int") + int.toString(); // the regex should find this one (because of the second int)
thanks!
It's not going to be bullet-proof, but this works for all your test cases:
(?<=^([^"]*|[^"]*"[^"]*"[^"]*))\bint\b(?=([^"]*|[^"]*"[^"]*"[^"]*)$)
It does a look behind and look ahead to assert that there's either none or two preceding/following quotes "
Here's the code in java with the output:
String regex = "(?<=^([^\"]*|[^\"]*\"[^\"]*\"[^\"]*))\\bint\\b(?=([^\"]*|[^\"]*\"[^\"]*\"[^\"]*)$)";
System.out.println(regex);
String[] tests = new String[] {
"int i;",
"String example = \"int i\";",
"logger.i(\"int\");",
"logger.i(\"int\") + int.toString();" };
for (String test : tests) {
System.out.println(test.matches("^.*" + regex + ".*$") + ": " + test);
}
Output (included regex so you can read it without all those \
escapes):
(?<=^([^"]*|[^"]*"[^"]*"[^"]*))\bint\b(?=([^"]*|[^"]*"[^"]*"[^"]*)$)
true: int i;
false: String example = "int i";
false: logger.i("int");
true: logger.i("int") + int.toString();
Using a regex is never going to be 100% accurate - you need a language parser. Consider escaped quotes in Strings "foo\"bar"
, in-line comments /* foo " bar */
, etc.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With