Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

RegEx for match/replacing JavaScript comments (both multiline and inline)

I need to remove all JavaScript comments from a JavaScript source using the JavaScript RegExp object.

What I need is the pattern for the RegExp.

So far, I've found this:

compressed = compressed.replace(/\/\*.+?\*\/|\/\/.*(?=[\n\r])/g, ''); 

This pattern works OK for:

/* I'm a comment */ 

or for:

/*  * I'm a comment aswell */ 

But doesn't seem to work for the inline:

// I'm an inline comment 

I'm not quite an expert for RegEx and it's patterns, so I need help.

Also, I' would like to have a RegEx pattern which would remove all those HTML-like comments.

<!-- HTML Comment //--> or <!-- HTML Comment --> 

And also those conditional HTML comments, which can be found in various JavaScript sources.

Thanks.

like image 260
metaforce Avatar asked May 13 '11 08:05

metaforce


People also ask

Which regex is used to perform multiline matching?

The "m" modifier specifies a multiline match. It only affects the behavior of start ^ and end $.

Which regex is used to comment?

comment ) construct lets you include an inline comment in a regular expression. The regular expression engine does not use any part of the comment in pattern matching, although the comment is included in the string that is returned by the Regex. ToString method. The comment ends at the first closing parenthesis.

How do I match a pattern in regex?

To match a character having special meaning in regex, you need to use a escape sequence prefix with a backslash ( \ ). E.g., \. matches "." ; regex \+ matches "+" ; and regex \( matches "(" .

Can I use regex in replace?

The Regex. Replace(String, String, MatchEvaluator, RegexOptions) method is useful for replacing a regular expression match if any of the following conditions is true: If the replacement string cannot readily be specified by a regular expression replacement pattern.


1 Answers

NOTE: Regex is not a lexer or a parser. If you have some weird edge case where you need some oddly nested comments parsed out of a string, use a parser. For the other 98% of the time this regex should work.

I had pretty complex block comments going on with nested asterisks, slashes, etc. The regular expression at the following site worked like a charm:

http://upshots.org/javascript/javascript-regexp-to-remove-comments
(see below for original)

Some modifications have been made, but the integrity of the original regex has been preserved. In order to allow certain double-slash (//) sequences (such as URLs), you must use back reference $1 in your replacement value instead of an empty string. Here it is:

/\/\*[\s\S]*?\*\/|([^\\:]|^)\/\/.*$/gm  // JavaScript:  // source_string.replace(/\/\*[\s\S]*?\*\/|([^\\:]|^)\/\/.*$/gm, '$1');  // PHP: // preg_replace("/\/\*[\s\S]*?\*\/|([^\\:]|^)\/\/.*$/m", "$1", $source_string); 

DEMO: https://regex101.com/r/B8WkuX/1

FAILING USE CASES: There are a few edge cases where this regex fails. An ongoing list of those cases is documented in this public gist. Please update the gist if you can find other cases.

...and if you also want to remove <!-- html comments --> use this:

/\/\*[\s\S]*?\*\/|([^\\:]|^)\/\/.*|<!--[\s\S]*?-->$/ 

(original - for historical reference only)

// DO NOT USE THIS - SEE ABOVE /(\/\*([\s\S]*?)\*\/)|(\/\/(.*)$)/gm 
like image 186
Ryan Wheale Avatar answered Sep 20 '22 13:09

Ryan Wheale