I need to dependably remove all JavaScript comments with a single Regular Expression.
I have searched StackOverflow, and other sites, but none take into account alternating quotes, multi-line comments, comments within strings, regular expressions, etc.
Is there any Regular expressions that can remove the comments from this:
var test = [
"// Code",
'// Code',
"'// Code",
'"// Code',
//" Comment",
//' Comment',
/* Comment */
// Comment /* Comment
/* Comment
Comment // */ "Code",
"Code",
"/* Code */",
"/* Code",
"Code */",
'/* Code */',
'/* Code',
'Code */',
/* Comment
"Comment",
Comment */ "Code",
/Code\/*/,
"Code */"
]
Here's a jsbin or jsfiddle to test it.
I like challenges :)
Here's my working solution:
/((["'])(?:\\[\s\S]|.)*?\2|\/(?![*\/])(?:\\.|\[(?:\\.|.)\]|.)*?\/)|\/\/.*?$|\/\*[\s\S]*?\*\//gm
Replace that with $1
.
Fiddle here: http://jsfiddle.net/LucasTrz/DtGq8/6/
Of course, as it has been pointed out countless times, a proper parser would probably be better, but still...
NB: I used a regex literal in the fiddle insted of a regex string, too much escaping can destroy your brain.
((["'])(?:\\[\s\S]|.)*?\2|\/(?![*\/])(?:\\.|\[(?:\\.|.)\]|.)*?\/) <-- the part to keep
|\/\/.*?$ <-- line comments
|\/\*[\s\S]*?\*\/ <-- inline comments
(["'])(?:\\[\s\S]|.)*?\2 <-- strings
\/(?![*\/])(?:\\.|\[(?:\\.|.)\]|.)*?\/ <-- regex literals
["'] match a quote and capture it
(?:\\[\s\S]|.)*? match escaped characters or unescpaed characters, don't capture
\2 match the same type of quote as the one that opened the string
\/ match a forward slash
(?![*\/]) ... not followed by a * or / (that would start a comment)
(?:\\.|\[(?:\\.|.)\]|.)*? match any sequence of escaped/unescaped text, or a regex character class
\/ ... until the closing slash
|\/\/.*?$ <-- line comments
|\/\*[\s\S]*?\*\/ <-- inline comments
\/\/ match two forward slashes
.*?$ then everything until the end of the line
\/\* match /*
[\s\S]*? then as few as possible of anything, see note below
\*\/ match */
I had to use [\s\S]
instead of .
because unfortunately JavaScript doesn't support the regex s
modifier (singleline - this one allows .
to match newlines as well)
This regex will work in the following corner cases:
/
in character classes: /[/]/
And just for the fun of it... here's the eye-bleeding hardcore version:
/((["'])(?:\\[\s\S]|.)*?\2|(?:[^\w\s]|^)\s*\/(?![*\/])(?:\\.|\[(?:\\.|.)\]|.)*?\/(?=[gmiy]{0,4}\s*(?![*\/])(?:\W|$)))|\/\/.*?$|\/\*[\s\S]*?\*\//gm
This adds the following twisted edge case (fiddle, regex101):
Code = /* Comment */ /Code regex/g ; // Comment
Code = Code / Code /* Comment */ /g ; // Comment
Code = /Code regex/g /* Comment */ ; // Comment
This is highly heuristical code, you probably shouldn't use it (even less so than the previous regex) and just let that edge case blow.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With