Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I strip multiline C comments from a file using Perl?

Tags:

c

comments

perl

Can anyone get me with the regular expression to strip multiline comments and single line comments in a file?

eg:

                  " WHOLE "/*...*/" HAS TO BE STRIPED OFF....."

1.   /* comment */
2.   /* comment1 */  code   /* comment2 */ #both /*comment1*/ and /*comment2*/ 
                                             #has to striped off and rest should 
                                                 #remain.
3.   /*.........
       .........
       .........
       ......... */

i realy appreciate you if u do this need.... thanks in advance.

like image 369
User1611 Avatar asked May 18 '09 12:05

User1611


1 Answers

From perlfaq6 "How do I use a regular expression to strip C style comments from a file?":


While this actually can be done, it's much harder than you'd think. For example, this one-liner

perl -0777 -pe 's{/\*.*?\*/}{}gs' foo.c

will work in many but not all cases. You see, it's too simple-minded for certain kinds of C programs, in particular, those with what appear to be comments in quoted strings. For that, you'd need something like this, created by Jeffrey Friedl and later modified by Fred Curtis.

$/ = undef;
$_ = <>;
s#/\*[^*]*\*+([^/*][^*]*\*+)*/|("(\\.|[^"\\])*"|'(\\.|[^'\\])*'|.[^/"'\\]*)#defined $2 ? $2 : ""#gse;
print;

This could, of course, be more legibly written with the /x modifier, adding whitespace and comments. Here it is expanded, courtesy of Fred Curtis.

s{
   /\*         ##  Start of /* ... */ comment
   [^*]*\*+    ##  Non-* followed by 1-or-more *'s
   (
     [^/*][^*]*\*+
   )*          ##  0-or-more things which don't start with /
               ##    but do end with '*'
   /           ##  End of /* ... */ comment

 |         ##     OR  various things which aren't comments:

   (
     "           ##  Start of " ... " string
     (
       \\.           ##  Escaped char
     |               ##    OR
       [^"\\]        ##  Non "\
     )*
     "           ##  End of " ... " string

   |         ##     OR

     '           ##  Start of ' ... ' string
     (
       \\.           ##  Escaped char
     |               ##    OR
       [^'\\]        ##  Non '\
     )*
     '           ##  End of ' ... ' string

   |         ##     OR

     .           ##  Anything other char
     [^/"'\\]*   ##  Chars which doesn't start a comment, string or escape
   )
 }{defined $2 ? $2 : ""}gxse;

A slight modification also removes C++ comments, possibly spanning multiple lines using a continuation character:

 s#/\*[^*]*\*+([^/*][^*]*\*+)*/|//([^\\]|[^\n][\n]?)*?\n|("(\\.|[^"\\])*"|'(\\.|[^'\\])*'|.[^/"'\\]*)#defined $3 ? $3 : ""#gse;
like image 160
brian d foy Avatar answered Oct 14 '22 09:10

brian d foy