Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Remove C and C++ comments using Python?

I'm looking for Python code that removes C and C++ comments from a string. (Assume the string contains an entire C source file.)

I realize that I could .match() substrings with a Regex, but that doesn't solve nesting /*, or having a // inside a /* */.

Ideally, I would prefer a non-naive implementation that properly handles awkward cases.

like image 822
TomZ Avatar asked Oct 27 '08 20:10

TomZ


People also ask

How do you delete multiple lines in a comment in Python?

To comment out multiple lines in Python, you can prepend each line with a hash ( # ).

Can Python interact with C?

In general, already-written C code will require no modifications to be used by Python. The only work we need to do to integrate C code in Python is on Python's side. The steps for interfacing Python with C using Ctypes.


1 Answers

This handles C++-style comments, C-style comments, strings and simple nesting thereof.

def comment_remover(text):     def replacer(match):         s = match.group(0)         if s.startswith('/'):             return " " # note: a space and not an empty string         else:             return s     pattern = re.compile(         r'//.*?$|/\*.*?\*/|\'(?:\\.|[^\\\'])*\'|"(?:\\.|[^\\"])*"',         re.DOTALL | re.MULTILINE     )     return re.sub(pattern, replacer, text) 

Strings needs to be included, because comment-markers inside them does not start a comment.

Edit: re.sub didn't take any flags, so had to compile the pattern first.

Edit2: Added character literals, since they could contain quotes that would otherwise be recognized as string delimiters.

Edit3: Fixed the case where a legal expression int/**/x=5; would become intx=5; which would not compile, by replacing the comment with a space rather then an empty string.

like image 172
Markus Jarderot Avatar answered Oct 07 '22 00:10

Markus Jarderot