Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Trim extra spaces in AWK

Tags:

bash

awk

I have this AWK script.

awk -v line="    foo    bar  " 'END
 {
   gsub(/^ +| +$/,"", line);
   gsub(/ {2,}/, " ", line);
   print line
 }' \
somefile.txt

The input file (somefile.txt) is irrelevant to my question. The part that goes after the END pattern is there to trim extra spaces in the line variable and print it out. Like this:

foo bar

I'm trying to see if there is a better, more compact way to do that in AWK. Using gsub to remove a couple of extra spaces is very cumbersome. It is hard to read and hard for a maintainer to understand what it does (especially if one never worked with AWK before). Any ideas on how to make it shorter or more explicit?

Thanks!

** EDIT **

AWK variable line is filtered during the awk processing of the input file and I want to trim extra spaces left after that.

like image 950
Misha Slyusarev Avatar asked Jun 11 '26 17:06

Misha Slyusarev


2 Answers

Another option using gsub() as you began to do can be done as:

awk '{gsub(/  +/," "); sub(/^ /,""); sub(/ $/,"")}1' <<< "    foo    bar  "

Where the first call to gsub() consolidates all multiple spaces to a single space before/between the fields. The second sub(/^ /,"") just trims the single space that remains at the front of the string, and finally the last sub(/ $/,"") trims the trailing space.

Either approach works well. Depending on your actual data and your FS value, there may be a preference for one over the other, but without knowing more, they are pretty much a wash.

Example Use/Output

$ awk '{gsub(/  +/," "); sub(/^ /,""); sub(/ $/,"")}1' <<< "    foo    bar  "
foo bar
like image 146
David C. Rankin Avatar answered Jun 17 '26 06:06

David C. Rankin


For the current example, another option might be to recalculate the text of the input record by first setting the value of line to the input record and then use $1=$1

awk -v line="    foo    bar  " 'END {$0=line; $1=$1; print}' somefile.txt

Output (the quotes are only for clarity that there are no leading or trailing spaces)

"foo bar"

The inner workings how the spaces are removed are described in the comments by Ed Morton:

Setting $0=line or any other change to $0 would trigger the fields being recalculated.

Using $1=$1 triggers the record to be recalculated in as much as it'll be rebuilt from the existing fields thereby stripping leading/trailing white space and replacing every other chain of contiguous white space with a single blank char (assuming the default FS and OFS are used).

like image 23
The fourth bird Avatar answered Jun 17 '26 05:06

The fourth bird



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!