I'm making a program to automate the writing of some C code, (I'm writing to parse strings into enumerations with the same name) C's handling of strings is not that great. So some people have been nagging me to try python.
I made a function that is supposed to remove C-style /* COMMENT */
and //COMMENT
from a string: Here is the code:
def removeComments(string): re.sub(re.compile("/\*.*?\*/",re.DOTALL ) ,"" ,string) # remove all occurance streamed comments (/*COMMENT */) from string re.sub(re.compile("//.*?\n" ) ,"" ,string) # remove all occurance singleline comments (//COMMENT\n ) from string
So I tried this code out.
str="/* spam * spam */ eggs" removeComments(str) print str
And it apparently did nothing.
Any suggestions as to what I've done wrong?
There's a saying I've heard a couple of times:
If you have a problem and you try to solve it with Regex you end up with two problems.
EDIT: Looking back at this years later. (after a fair bit more parsing experience)
I think regex may have been the right solution. And the simple regex used here "good enough". I may not have emphasized this enough in the question. This was for a single specific file. That had no tricky situations. I think it would be a lot less maintenance to keep the file being parsed simple enough for the regex, than to complicate the regex, into an unreadable symbol soup. (e.g. require that the file only use //
single line comments.)
What about "//comment-like strings inside quotes"
?
OP is asking how to do do it using regular expressions; so:
def remove_comments(string): pattern = r"(\".*?\"|\'.*?\')|(/\*.*?\*/|//[^\r\n]*$)" # first group captures quoted strings (double or single) # second group captures comments (//single-line or /* multi-line */) regex = re.compile(pattern, re.MULTILINE|re.DOTALL) def _replacer(match): # if the 2nd group (capturing comments) is not None, # it means we have captured a non-quoted (real) comment string. if match.group(2) is not None: return "" # so we will return empty to remove the comment else: # otherwise, we will return the 1st group return match.group(1) # captured quoted-string return regex.sub(_replacer, string)
This WILL remove:
/* multi-line comments */
// single-line comments
Will NOT remove:
String var1 = "this is /* not a comment. */";
char *var2 = "this is // not a comment, either.";
url = 'http://not.comment.com';
Note: This will also work for Javascript source.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With