Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Comprehensive RegExp to remove JavaScript comments

I need to dependably remove all JavaScript comments with a single Regular Expression.

I have searched StackOverflow, and other sites, but none take into account alternating quotes, multi-line comments, comments within strings, regular expressions, etc.

Is there any Regular expressions that can remove the comments from this:

var test = [
    "// Code",
    '// Code',
    "'// Code",
    '"// Code',
    //" Comment",
    //' Comment',
    /* Comment */
    // Comment /* Comment
    /* Comment
     Comment // */ "Code",
    "Code",
    "/* Code */",
    "/* Code",
    "Code */",
    '/* Code */',
    '/* Code',
    'Code */',
    /* Comment
    "Comment",
    Comment */ "Code",
    /Code\/*/,
    "Code */"
]

Here's a jsbin or jsfiddle to test it.

like image 858
wizulus Avatar asked Jul 01 '14 19:07

wizulus


1 Answers

I like challenges :)

Here's my working solution:

/((["'])(?:\\[\s\S]|.)*?\2|\/(?![*\/])(?:\\.|\[(?:\\.|.)\]|.)*?\/)|\/\/.*?$|\/\*[\s\S]*?\*\//gm

Replace that with $1.

Fiddle here: http://jsfiddle.net/LucasTrz/DtGq8/6/

Of course, as it has been pointed out countless times, a proper parser would probably be better, but still...

NB: I used a regex literal in the fiddle insted of a regex string, too much escaping can destroy your brain.


Breakdown

((["'])(?:\\[\s\S]|.)*?\2|\/(?![*\/])(?:\\.|\[(?:\\.|.)\]|.)*?\/) <-- the part to keep
|\/\/.*?$                                                         <-- line comments
|\/\*[\s\S]*?\*\/                                                 <-- inline comments

The part to keep

(["'])(?:\\[\s\S]|.)*?\2                   <-- strings
\/(?![*\/])(?:\\.|\[(?:\\.|.)\]|.)*?\/     <-- regex literals

Strings

    ["']              match a quote and capture it
    (?:\\[\s\S]|.)*?  match escaped characters or unescpaed characters, don't capture
    \2                match the same type of quote as the one that opened the string

Regex literals

    \/                          match a forward slash
    (?![*\/])                   ... not followed by a * or / (that would start a comment)
    (?:\\.|\[(?:\\.|.)\]|.)*?   match any sequence of escaped/unescaped text, or a regex character class
    \/                          ... until the closing slash

The part to remove

|\/\/.*?$              <-- line comments
|\/\*[\s\S]*?\*\/      <-- inline comments

Line comments

    \/\/         match two forward slashes
    .*?$         then everything until the end of the line

Inline comments

    \/\*         match /*
    [\s\S]*?     then as few as possible of anything, see note below
    \*\/         match */

I had to use [\s\S] instead of . because unfortunately JavaScript doesn't support the regex s modifier (singleline - this one allows . to match newlines as well)

This regex will work in the following corner cases:

  • Regex patterns containing / in character classes: /[/]/
  • Escaped newlines in string literals

Final boss fight

And just for the fun of it... here's the eye-bleeding hardcore version:

/((["'])(?:\\[\s\S]|.)*?\2|(?:[^\w\s]|^)\s*\/(?![*\/])(?:\\.|\[(?:\\.|.)\]|.)*?\/(?=[gmiy]{0,4}\s*(?![*\/])(?:\W|$)))|\/\/.*?$|\/\*[\s\S]*?\*\//gm

This adds the following twisted edge case (fiddle, regex101):

Code = /* Comment */ /Code regex/g  ; // Comment
Code = Code / Code /* Comment */ /g  ; // Comment    
Code = /Code regex/g /* Comment */  ; // Comment

This is highly heuristical code, you probably shouldn't use it (even less so than the previous regex) and just let that edge case blow.

like image 193
Lucas Trzesniewski Avatar answered Sep 20 '22 08:09

Lucas Trzesniewski