I'm looking for Python code that removes C and C++ comments from a string. (Assume the string contains an entire C source file.)
I realize that I could .match() substrings with a Regex, but that doesn't solve nesting /*
, or having a //
inside a /* */
.
Ideally, I would prefer a non-naive implementation that properly handles awkward cases.
To comment out multiple lines in Python, you can prepend each line with a hash ( # ).
In general, already-written C code will require no modifications to be used by Python. The only work we need to do to integrate C code in Python is on Python's side. The steps for interfacing Python with C using Ctypes.
This handles C++-style comments, C-style comments, strings and simple nesting thereof.
def comment_remover(text): def replacer(match): s = match.group(0) if s.startswith('/'): return " " # note: a space and not an empty string else: return s pattern = re.compile( r'//.*?$|/\*.*?\*/|\'(?:\\.|[^\\\'])*\'|"(?:\\.|[^\\"])*"', re.DOTALL | re.MULTILINE ) return re.sub(pattern, replacer, text)
Strings needs to be included, because comment-markers inside them does not start a comment.
Edit: re.sub didn't take any flags, so had to compile the pattern first.
Edit2: Added character literals, since they could contain quotes that would otherwise be recognized as string delimiters.
Edit3: Fixed the case where a legal expression int/**/x=5;
would become intx=5;
which would not compile, by replacing the comment with a space rather then an empty string.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With