Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Explanation about awk command using ORS, NR, FS, RS

Tags:

linux

awk

I have a sample data set of :

1
2
3
4
5
6

which is successfully parsed by the following awk command into the desired output

awk 'ORS=NR%3?FS:RS'

   1 2 3
   4 5 6

Can you please provide an explanation of what this command does? I'm not able to put the individual pieces together.

From what I understand:

  • ORS = output record separator - this is what we want the RS to be for the final output, which is a row of 3 columns

  • NR%3 = we want to group the data into rows of 3 elements

  • ?FS:RS - unsure how this fits into the command.

Thanks.

like image 501
user Avatar asked May 06 '19 01:05

user


3 Answers

% is the modulo operator (see https://en.wikipedia.org/wiki/Modulo_operation) and NR%3?FS:RS is a ternary expression (see https://en.wikipedia.org/wiki/%3F:). Those are both common constructs in many programming languages, they aren't specific to awk. For the meaning of ORS, NR, FS, and RS just see that awk man page.

Run this to see the values of the variables in the code before and after the command you're executing:

$ cat tst.awk
BEGIN {
    printf "%s=\"%s\"\n", "RS", RS
    printf "%s=\"%s\"\n", "FS", FS
}
{
    printf "---\n"

    printf "%s=\"%s\"\n", "$0", $0
    printf "%s=\"%s\"\n", "NR", NR
    printf "%s=\"%s\"\n", "NR%3", NR%3

    printf "before) %s=\"%s\"\n", "ORS", ORS

    ORS = (NR%3 ? FS : RS)

    printf "after) %s=\"%s\"\n", "ORS", ORS
}

.

$ awk -f tst.awk file
RS="
"
FS=" "
---
$0="1"
NR="1"
NR%3="1"
before) ORS="
"
after) ORS=" "
---
$0="2"
NR="2"
NR%3="2"
before) ORS=" "
after) ORS=" "
---
$0="3"
NR="3"
NR%3="0"
before) ORS=" "
after) ORS="
"
---
$0="4"
NR="4"
NR%3="1"
before) ORS="
"
after) ORS=" "
---
$0="5"
NR="5"
NR%3="2"
before) ORS=" "
after) ORS=" "
---
$0="6"
NR="6"
NR%3="0"
before) ORS=" "
after) ORS="
"

Notice on which input line numbers (NR) the Output Record Separator (ORS) becomes a newline (like RS) vs a blank char (like FS).

A more verbose way to write the same code would be:

$ cat tst.awk
{
    if (NR%3 == 0) {
        ORS = "\n"
    }
    else {
        ORS = " "
    }

    print
}

$ awk -f tst.awk file
1 2 3
4 5 6

and FYI the correct (more robust and clearer) way to write the concise, idiomatic code attempted in your question would be:

awk '{ORS=(NR%3?FS:RS)}1'

The parens around the ternary are required in some awks in some contexts and always improve readability so just always use them. The original code is relying on the result of the assignment to ORS producing a non-null/non-zero value in order for it to be a true condition and so invoke awks default action of printing the current record. Only use the result of an action in that context when you NEED to otherwise it will bite you one day when your data isn't exactly what you expected. Instead of leaving the assignment in a condition block, I moved it into an action block and then added a constant true condition afterwards, 1 to ensure every record gets printed regardless of what that assignment results in.

like image 178
Ed Morton Avatar answered Nov 19 '22 06:11

Ed Morton


Not the awk explanation since you already have more than one good answer, but alternatives for the same task

$ seq 6 | xargs -n3
1 2 3
4 5 6

$ seq 6 | paste - - -
1       2       3
4       5       6

with paste default delimiter is tab, which you can change to space with -d' '

$ seq 6 | pr -3ats' '
1 2 3
4 5 6
like image 30
karakfa Avatar answered Nov 19 '22 07:11

karakfa


If the record count is a multiple of 3 (NR%3 == 0), 0 is treated as false then the output record separator is set to the default record separator (RS), which is newline.

If the record count is not a multiple of 3 (NR%3 != 0) none 0 values are treated as true, then the output record separator is set to the default field separator (FS) which is space.

If this condition is met (which it always is since you're doing an assignment) then do the default action, which is to print the record.

like image 22
Mattias Larsson Avatar answered Nov 19 '22 07:11

Mattias Larsson