I'm looking for Python code that removes C and C++ comments from a string. (Assume the string contains an entire C source file.)
I realize that I could .match() substrings with a Regex, but that doesn't solve nesting /*, or having a // inside a /* */.
Ideally, I would prefer a non-naive implementation that properly handles awkward cases.
To comment out multiple lines in Python, you can prepend each line with a hash ( # ).
In general, already-written C code will require no modifications to be used by Python. The only work we need to do to integrate C code in Python is on Python's side. The steps for interfacing Python with C using Ctypes.
This handles C++-style comments, C-style comments, strings and simple nesting thereof.
def comment_remover(text):     def replacer(match):         s = match.group(0)         if s.startswith('/'):             return " " # note: a space and not an empty string         else:             return s     pattern = re.compile(         r'//.*?$|/\*.*?\*/|\'(?:\\.|[^\\\'])*\'|"(?:\\.|[^\\"])*"',         re.DOTALL | re.MULTILINE     )     return re.sub(pattern, replacer, text)   Strings needs to be included, because comment-markers inside them does not start a comment.
Edit: re.sub didn't take any flags, so had to compile the pattern first.
Edit2: Added character literals, since they could contain quotes that would otherwise be recognized as string delimiters.
Edit3: Fixed the case where a legal expression int/**/x=5; would become intx=5; which would not compile, by replacing the comment with a space rather then an empty string.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With