Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to preserve the original whitespace between fields in awk?

Tags:

awk

When processing input with awk, sometimes I want to edit one of the fields, without touching anything else. Consider this:

$ ls -l | awk 1
total 88
-rw-r--r-- 1 jack jack     8 Jun 19  2013 qunit-1.11.0.css
-rw-r--r-- 1 jack jack 56908 Jun 19  2013 qunit-1.11.0.js
-rw-r--r-- 1 jack jack  4306 Dec 29 09:16 test1.html
-rw-r--r-- 1 jack jack  5476 Dec  7 08:09 test1.js

If I don't edit any of the fields ($1, $2, ...), everything is preserved as it was. But if let's say I want to keep only the first 3 characters of the first field:

$ ls -l | awk '{$1 = substr($1, 1, 3) } 1'
tot 88
-rw 1 jack jack 8 Jun 19 2013 qunit-1.11.0.css
-rw 1 jack jack 56908 Jun 19 2013 qunit-1.11.0.js
-rw 1 jack jack 4306 Dec 29 09:16 test1.html
-rw 1 jack jack 5476 Dec 7 08:09 test1.js

The original whitespace between all fields is replaced with a simple space.

Is there a way to preserve the original whitespace between the fields?

UPDATE

In this sample, it's relatively easy to edit the first 4 fields. But what if I want to keep only the 1st letter of $5 in order to get this output:

-rw-r--r-- 1 jack jack     8 J 19  2013 qunit-1.11.0.css
-rw-r--r-- 1 jack jack 56908 J 19  2013 qunit-1.11.0.js
-rw-r--r-- 1 jack jack  4306 D 29 09:16 test1.html
-rw-r--r-- 1 jack jack  5476 D  7 08:09 test1.js
like image 233
janos Avatar asked Dec 30 '13 07:12

janos


People also ask

How do you put a space between columns in awk?

To place the space between the arguments, just add " " , e.g. awk {'print $5" "$1'} .

What is awk NF?

The “NF” AWK variable is used to print the number of fields in all the lines of any provided file. This built-in variable iterates through all the lines of the file one by one and prints the number of fields separately for each line.

What is a field in awk?

A field is a component of a record delimited by a field separator. By default, awk sees whitespace, such as spaces, tabs, and newlines, as indicators of a new field. Specifically, awk treats multiple space separators as one, so this line contains two fields: raspberry red.


2 Answers

If you want to preserve the whitespace you could also try the split function. In Gnu Awk version 4 the split function accepts 4 arguments, where the latter is the separators between the fields. For instance,

echo "a  2   4  6" | gawk ' {
 n=split($0,a," ",b)
 a[3]=7
 line=b[0]
 for (i=1;i<=n; i++)
     line=(line a[i] b[i])
 print line
}' 

gives output

a  2   7  6
like image 108
Håkon Hægland Avatar answered Sep 25 '22 22:09

Håkon Hægland


I know this is an old question but I thought there had to be something better. This answer is for those that stumbled onto this question while searching. While looking around on the web, I have to say @Håkon Hægland has the best answer and that is what I used at first.

But here is my solution. Use FPAT. It can set a regular expression to say what a field should be.

 FPAT = "([[:space:]]*[[:alnum:][:punct:][:digit:]]+)";
In this case, I am saying the field should start with zero or more blank characters and ends with basically any other character except blank characters. Here is a link if you are having trouble understanding POSIX bracket expressions.

Also, change the output field to OFS = ""; separator because once the line has been manipulated, the output will add an extra blank space as a separator if you don't change OFS from its default.

I used the same example to test.

$ cat example-output.txt
-rw-r--r-- 1 jack jack     8 Jun 19  2013 qunit-1.11.0.css
-rw-r--r-- 1 jack jack 56908 Jun 19  2013 qunit-1.11.0.js
-rw-r--r-- 1 jack jack  4306 Dec 29 09:16 test1.html
-rw-r--r-- 1 jack jack  5476 Dec  7 08:09 test1.js
$ awk 'BEGIN { FPAT = "([[:space:]]*[[:alnum:][:punct:][:digit:]]+)"; OFS = ""; } { $6 = substr( $6, 1, 2);  print $0; }' example-output.txt
-rw-r--r-- 1 jack jack     8 J 19  2013 qunit-1.11.0.css
-rw-r--r-- 1 jack jack 56908 J 19  2013 qunit-1.11.0.js
-rw-r--r-- 1 jack jack  4306 D 29 09:16 test1.html
-rw-r--r-- 1 jack jack  5476 D  7 08:09 test1.js

Keep in mind. The fields now have leading spaces. So if the field needs to be replaced by something else, you can do

len = length($1); 
$1 = sprintf("%"(len)"s", "-42-");
$ awk 'BEGIN { FPAT = "([[:space:]]*[[:alnum:][:punct:][:digit:]]+)"; OFS = ""; } { if(NR==1){ len = length($1); $1 = sprintf("%"(len)"s", "-42-"); } print $0; }' example-output.txt
      -42- 1 jack jack     8 Jun 19  2013 qunit-1.11.0.css
-rw-r--r-- 1 jack jack 56908 Jun 19  2013 qunit-1.11.0.js
-rw-r--r-- 1 jack jack  4306 Dec 29 09:16 test1.html
-rw-r--r-- 1 jack jack  5476 Dec  7 08:09 test1.js
like image 39
patrickh003 Avatar answered Sep 26 '22 22:09

patrickh003