Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Help building a regex

Tags:

java

regex

I need to build a regular expression that finds the word "int" only if it's not part of some string.

I want to find whether int is used in the code. (not in some string, only in regular code)

Example:

int i;  // the regex should find this one.
String example = "int i"; // the regex should ignore this line.
logger.i("int"); // the regex should ignore this line. 
logger.i("int") + int.toString(); // the regex should find this one (because of the second int)

thanks!

like image 398
Adibe7 Avatar asked Jun 26 '11 14:06

Adibe7


1 Answers

It's not going to be bullet-proof, but this works for all your test cases:

(?<=^([^"]*|[^"]*"[^"]*"[^"]*))\bint\b(?=([^"]*|[^"]*"[^"]*"[^"]*)$)

It does a look behind and look ahead to assert that there's either none or two preceding/following quotes "

Here's the code in java with the output:

    String regex = "(?<=^([^\"]*|[^\"]*\"[^\"]*\"[^\"]*))\\bint\\b(?=([^\"]*|[^\"]*\"[^\"]*\"[^\"]*)$)";
    System.out.println(regex);
    String[] tests = new String[] { 
            "int i;", 
            "String example = \"int i\";", 
            "logger.i(\"int\");", 
            "logger.i(\"int\") + int.toString();" };

    for (String test : tests) {
        System.out.println(test.matches("^.*" + regex + ".*$") + ": " + test);
    }

Output (included regex so you can read it without all those \ escapes):

(?<=^([^"]*|[^"]*"[^"]*"[^"]*))\bint\b(?=([^"]*|[^"]*"[^"]*"[^"]*)$)
true: int i;
false: String example = "int i";
false: logger.i("int");
true: logger.i("int") + int.toString();

Using a regex is never going to be 100% accurate - you need a language parser. Consider escaped quotes in Strings "foo\"bar", in-line comments /* foo " bar */, etc.

like image 87
Bohemian Avatar answered Oct 05 '22 11:10

Bohemian