Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Removing C++ Comment From Source Code

I have some c++ code with /* */ and // style comments. I want to have a way to remove them all automatically. Apparently, using an editor (e.g. ultraedit) with some regexp searching for /*, */ and // should do the job. But, on a closer look, a complete solution isn't that simple because the sequences /* or // may not represent a comment if they're inside another comment, string literal or character literal. e.g.

printf(" \" \" " "  /* this is not a comment and is surrounded by an unknown number of double-quotes */");

is a comment sequence inside a double quote. And, it isn't a simple task to determine if a string is inside a pair of valid double-quotes. While this

// this is a single line comment /* <--- this does not start a comment block 
// this is a second comment line with an */ within

is comment sequences inside other comments.

Is there a more comprehensive way to remove comments from a C++ source taking into account string literal and comment? For example can we instruct the preprocessor to remove comments while doesn't carry out, say, #include directive?

like image 788
JavaMan Avatar asked May 26 '26 00:05

JavaMan


2 Answers

The C pre-processor can remove the comments.

Edited:

I have updated so that we can use the MACROS to expand the #if statements

> cat t.cpp
/*
 * Normal comment
 */
// this is a single line comment /* <--- this does not start a comment block 
// this is a second comment line with an */ within
#include <stdio.h>

#if __SIZEOF_LONG__ == 4
int bits = 32;
#else
int bits = 16;
#endif

int main()
{
    printf(" \" \" " " /* this is not a comment and is surrounded by an unknown number of double-quotes */");
    /*
     * comment with a single // line comment enbedded.
     */
    int x;
    // A single line comment /* Normal enbedded */ Comment
}

Because we want the #if statements to expand correctly we need a list of defines.
That's relatively trivial. cpp -E -dM.

Then we pipe the #defines and the original file back through the pre-processor but prevent the includes from being expanded this time.

> cpp -E -dM t.cpp > /tmp/def
> cat /tmp/def t.cpp | sed -e s/^#inc/-#inc/ | cpp - | sed s/^-#inc/#inc/
# 1 "t.cpp"
# 1 "<built-in>"
# 1 "<command-line>"
# 1 "t.cpp"






#include <stdio.h>


int bits = 32;




int main()
{
    printf(" \" \" " " /* this is not a comment and is surrounded by an unknown number of double-quotes */");    



    int x;

}
like image 83
Martin York Avatar answered May 28 '26 12:05

Martin York


Our SD C++ Formatter has an option to pretty print the source text and remove all comments. It uses our full C++ front end to parse the text, so it is not confused by whitespace, line breaks, string literals or preprocessor issues, nor will it break the code by its formatting changes.

If you are removing comments, you may be trying to obfuscate the source code. The Formatter also comes in an obfuscating version.

like image 37
Ira Baxter Avatar answered May 28 '26 14:05

Ira Baxter



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!