Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Delete all comments in a file using sed

Tags:

bash

sed

How would you delete all comments using sed from a file(defined with #) with respect to '#' being in a string?

This helped out a lot except for the string portion.

like image 628
Logick Avatar asked Nov 25 '12 05:11

Logick


People also ask

Can we delete content in a file by using sed command?

There is no available to delete all contents of the file. How to delete all contents of the file using sed command.


3 Answers

This might work for you (GNU sed):

sed '/#/!b;s/^/\n/;ta;:a;s/\n$//;t;s/\n\(\("[^"]*"\)\|\('\''[^'\'']*'\''\)\)/\1\n/;ta;s/\n\([^#]\)/\1\n/;ta;s/\n.*//' file
  • /#/!b if the line does not contain a # bail out
  • s/^/\n/ insert a unique marker (\n)
  • ta;:a jump to a loop label (resets the substitute true/false flag)
  • s/\n$//;t if marker at the end of the line, remove and bail out
  • s/\n\(\("[^"]*"\)\|\('\''[^'\'']*'\''\)\)/\1\n/;ta if the string following the marker is a quoted one, bump the marker forward of it and loop.
  • s/\n\([^#]\)/\1\n/;ta if the character following the marker is not a #, bump the marker forward of it and loop.
  • s/\n.*// the remainder of the line is comment, remove the marker and the rest of line.
like image 184
potong Avatar answered Nov 04 '22 10:11

potong


If # always means comment, and can appear anywhere on a line (like after some code):

sed 's:#.*$::g' <file-name>

If you want to change it in place, add the -i switch:

sed -i 's:#.*$::g' <file-name>

This will delete from any # to the end of the line, ignoring any context. If you use # anywhere where it's not a comment (like in a string), it will delete that too.

If comments can only start at the beginning of a line, do something like this:

sed 's:^#.*$::g' <file-name>

If they may be preceded by whitespace, but nothing else, do:

sed 's:^\s*#.*$::g' <file-name>

These two will be a little safer because they likely won't delete valid usage of # in your code, such as in strings.

Edit:

There's not really a nice way of detecting whether something is in a string. I'd use the last two if that would satisfy the constraints of your language.

The problem with detecting whether you're in a string is that regular expressions can't do everything. There are a few problems:

  • Strings can likely span lines
  • A regular expression can't tell the difference between apostrophies and single quotes
  • A regular expression can't match nested quotes (these cases will confuse the regex):

    # "hello there"
    # hello there"
    "# hello there"
    

If double quotes are the only way strings are defined, double quotes will never appear in a comment, and strings cannot span multiple lines, try something like this:

sed 's:#[^"]*$::g' <file-name>

That's a lot of pre-conditions, but if they all hold, you're in business. Otherwise, I'm afraid you're SOL, and you'd be better off writing it in something like Python, where you can do more advanced logic.

like image 29
beatgammit Avatar answered Nov 04 '22 08:11

beatgammit


Since there is no sample input provided by asker, I will assume a couple of cases and Bash is the input file because bash is used as the tag of the question.

Case 1: entire line is the comment

The following should be sufficient enough in most case:

sed '/^\s*#/d' file

It matches any line has which has none or at least one leading white-space characters (space, tab, or a few others, see man isspace), followed by a #, then delete the line by d command.

Any lines like:

# comment started from beginning.
         # any number of white-space character before
    # or 'quote' in "here"

They will be deleted.

But

a="foobar in #comment"

will not be deleted, which is the desired result.

Case 2: comment after actual code

For example:

if [[ $foo == "#bar" ]]; then # comment here

The comment part can be removed by

sed "s/\s*#*[^\"']*$//" file

[^\"'] is used to prevent quoted string confusion, however, it also means that comments with quotations ' or " will not to be removed.

Final sed

sed "/^\s*#/d;s/\s*#[^\"']*$//" file
like image 30
livibetter Avatar answered Nov 04 '22 08:11

livibetter