How would you delete all comments using sed from a file(defined with #) with respect to '#' being in a string?
This helped out a lot except for the string portion.
There is no available to delete all contents of the file. How to delete all contents of the file using sed command.
This might work for you (GNU sed):
sed '/#/!b;s/^/\n/;ta;:a;s/\n$//;t;s/\n\(\("[^"]*"\)\|\('\''[^'\'']*'\''\)\)/\1\n/;ta;s/\n\([^#]\)/\1\n/;ta;s/\n.*//' file
/#/!b
if the line does not contain a #
bail outs/^/\n/
insert a unique marker (\n
)ta;:a
jump to a loop label (resets the substitute true/false flag)s/\n$//;t
if marker at the end of the line, remove and bail outs/\n\(\("[^"]*"\)\|\('\''[^'\'']*'\''\)\)/\1\n/;ta
if the string following the marker is a quoted one, bump the marker forward of it and loop.s/\n\([^#]\)/\1\n/;ta
if the character following the marker is not a #
, bump the marker forward of it and loop.s/\n.*//
the remainder of the line is comment, remove the marker and the rest of line.If #
always means comment, and can appear anywhere on a line (like after some code):
sed 's:#.*$::g' <file-name>
If you want to change it in place, add the -i
switch:
sed -i 's:#.*$::g' <file-name>
This will delete from any #
to the end of the line, ignoring any context. If you use #
anywhere where it's not a comment (like in a string), it will delete that too.
If comments can only start at the beginning of a line, do something like this:
sed 's:^#.*$::g' <file-name>
If they may be preceded by whitespace, but nothing else, do:
sed 's:^\s*#.*$::g' <file-name>
These two will be a little safer because they likely won't delete valid usage of #
in your code, such as in strings.
Edit:
There's not really a nice way of detecting whether something is in a string. I'd use the last two if that would satisfy the constraints of your language.
The problem with detecting whether you're in a string is that regular expressions can't do everything. There are a few problems:
A regular expression can't match nested quotes (these cases will confuse the regex):
# "hello there"
# hello there"
"# hello there"
If double quotes are the only way strings are defined, double quotes will never appear in a comment, and strings cannot span multiple lines, try something like this:
sed 's:#[^"]*$::g' <file-name>
That's a lot of pre-conditions, but if they all hold, you're in business. Otherwise, I'm afraid you're SOL, and you'd be better off writing it in something like Python, where you can do more advanced logic.
Since there is no sample input provided by asker, I will assume a couple of cases and Bash is the input file because bash is used as the tag of the question.
Case 1: entire line is the comment
The following should be sufficient enough in most case:
sed '/^\s*#/d' file
It matches any line has which has none or at least one leading white-space characters (space, tab, or a few others, see man isspace
), followed by a #
, then delete the line by d
command.
Any lines like:
# comment started from beginning.
# any number of white-space character before
# or 'quote' in "here"
They will be deleted.
But
a="foobar in #comment"
will not be deleted, which is the desired result.
Case 2: comment after actual code
For example:
if [[ $foo == "#bar" ]]; then # comment here
The comment part can be removed by
sed "s/\s*#*[^\"']*$//" file
[^\"']
is used to prevent quoted string confusion, however, it also means that comments with quotations '
or "
will not to be removed.
Final sed
sed "/^\s*#/d;s/\s*#[^\"']*$//" file
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With