How to match a comment unless it's in a quoted string?

Question

So I have some string:

//Blah blah blach
// sdfkjlasdf
"Another //thing"

And I'm using java regex to replace all the lines that have double slashes like so:

theString = Pattern.compile("//(.*?)\n", Pattern.DOTALL).matcher(theString).replaceAll("");

And it works for the most part, but the problem is it removes all the occurrences and I need to find a way to have it not remove the quoted occurrence. How would I go about doing that?

Bart Kiers · Accepted Answer

Instead of using a parser that parses an entire Java source file, or writing something yourself that parses only those parts you're interested in, you could use some 3rd party tool like ANTLR.

ANTLR has the ability to define only those tokens you are interested in (and of course the tokens that can mess up your token-stream like multi-line comments and String- and char literals). So you only need to define a lexer (another word for tokenizer) that correctly handles those tokens.

This is called a grammar. In ANTLR, such a grammar could look like this:

lexer grammar FuzzyJavaLexer;

options{filter=true;}

SingleLineComment
  :  '//' ~( '
' | '
' )*
  ;

MultiLineComment
  :  '/*' .* '*/'
  ;

StringLiteral
  :  '"' ( '\' . | ~( '"' | '\' ) )* '"'
  ;

CharLiteral
  :  '\'' ( '\' . | ~( '\'' | '\' ) )* '\''
  ;

Save the above in a file called FuzzyJavaLexer.g. Now download ANTLR 3.2 here and save it in the same folder as your FuzzyJavaLexer.g file.

Execute the following command:

java -cp antlr-3.2.jar org.antlr.Tool FuzzyJavaLexer.g

which will create a FuzzyJavaLexer.java source class.

Of course you need to test the lexer, which you can do by creating a file called FuzzyJavaLexerTest.java and copying the code below in it:

import org.antlr.runtime.*;

public class FuzzyJavaLexerTest {
    public static void main(String[] args) throws Exception {
        String source = 
            "class Test {                                 
"+
            "  String s = \" ... \\" // no comment \";   
"+
            "  /*                                         
"+
            "   * also no comment: // foo                 
"+
            "   */                                        
"+
            "  char quote = '\"';                         
"+
            "  // yes, a comment, finally!!!              
"+
            "  int i = 0; // another comment              
"+
            "}                                            
";
        System.out.println("===== source =====");
        System.out.println(source);
        System.out.println("==================");
        ANTLRStringStream in = new ANTLRStringStream(source);
        FuzzyJavaLexer lexer = new FuzzyJavaLexer(in);
        CommonTokenStream tokens = new CommonTokenStream(lexer);
        for(Object obj : tokens.getTokens()) {
            Token token = (Token)obj;
            if(token.getType() == FuzzyJavaLexer.SingleLineComment) {
                System.out.println("Found a SingleLineComment on line "+token.getLine()+
                        ", starting at column "+token.getCharPositionInLine()+
                        ", text: "+token.getText());
            }
        }
    }
}

Next, compile your FuzzyJavaLexer.java and FuzzyJavaLexerTest.java by doing:

javac -cp .:antlr-3.2.jar *.java

and finally execute the FuzzyJavaLexerTest.class file:

// *nix/MacOS
java -cp .:antlr-3.2.jar FuzzyJavaLexerTest

or:

// Windows
java -cp .;antlr-3.2.jar FuzzyJavaLexerTest

after which you'll see the following being printed to your console:

===== source =====
class Test {                                 
  String s = " ... \" // no comment ";   
  /*                                         
   * also no comment: // foo                 
   */                                        
  char quote = '"';                         
  // yes, a comment, finally!!!              
  int i = 0; // another comment              
}                                            

==================
Found a SingleLineComment on line 7, starting at column 2, text: // yes, a comment, finally!!!              
Found a SingleLineComment on line 8, starting at column 13, text: // another comment

Pretty easy, eh? :)

How to match a comment unless it's in a quoted string?

Tags:

java

regex

regex-negation

parsing

Confused

1 Answers

Bart Kiers

Recent Activity

Donate For Us

How to match a comment unless it's in a quoted string?

Tags:

java

regex

regex-negation

parsing

Confused

1 Answers

Bart Kiers

Related questions

Recent Activity

Donate For Us