I have a sample data set of :
1
2
3
4
5
6
which is successfully parsed by the following awk command into the desired output
awk 'ORS=NR%3?FS:RS'
1 2 3
4 5 6
Can you please provide an explanation of what this command does? I'm not able to put the individual pieces together.
From what I understand:
ORS
= output record separator - this is what we want the RS to be for the final output, which is a row of 3 columns
NR%3
= we want to group the data into rows of 3 elements
?FS:RS
- unsure how this fits into the command.
Thanks.
%
is the modulo operator (see https://en.wikipedia.org/wiki/Modulo_operation) and NR%3?FS:RS
is a ternary expression (see https://en.wikipedia.org/wiki/%3F:). Those are both common constructs in many programming languages, they aren't specific to awk. For the meaning of ORS, NR, FS, and RS just see that awk man page.
Run this to see the values of the variables in the code before and after the command you're executing:
$ cat tst.awk
BEGIN {
printf "%s=\"%s\"\n", "RS", RS
printf "%s=\"%s\"\n", "FS", FS
}
{
printf "---\n"
printf "%s=\"%s\"\n", "$0", $0
printf "%s=\"%s\"\n", "NR", NR
printf "%s=\"%s\"\n", "NR%3", NR%3
printf "before) %s=\"%s\"\n", "ORS", ORS
ORS = (NR%3 ? FS : RS)
printf "after) %s=\"%s\"\n", "ORS", ORS
}
.
$ awk -f tst.awk file
RS="
"
FS=" "
---
$0="1"
NR="1"
NR%3="1"
before) ORS="
"
after) ORS=" "
---
$0="2"
NR="2"
NR%3="2"
before) ORS=" "
after) ORS=" "
---
$0="3"
NR="3"
NR%3="0"
before) ORS=" "
after) ORS="
"
---
$0="4"
NR="4"
NR%3="1"
before) ORS="
"
after) ORS=" "
---
$0="5"
NR="5"
NR%3="2"
before) ORS=" "
after) ORS=" "
---
$0="6"
NR="6"
NR%3="0"
before) ORS=" "
after) ORS="
"
Notice on which input line numbers (NR
) the Output Record Separator (ORS
) becomes a newline (like RS
) vs a blank char (like FS
).
A more verbose way to write the same code would be:
$ cat tst.awk
{
if (NR%3 == 0) {
ORS = "\n"
}
else {
ORS = " "
}
print
}
$ awk -f tst.awk file
1 2 3
4 5 6
and FYI the correct (more robust and clearer) way to write the concise, idiomatic code attempted in your question would be:
awk '{ORS=(NR%3?FS:RS)}1'
The parens around the ternary are required in some awks in some contexts and always improve readability so just always use them. The original code is relying on the result of the assignment to ORS producing a non-null/non-zero value in order for it to be a true condition and so invoke awks default action of printing the current record. Only use the result of an action in that context when you NEED to otherwise it will bite you one day when your data isn't exactly what you expected. Instead of leaving the assignment in a condition block, I moved it into an action block and then added a constant true condition afterwards, 1
to ensure every record gets printed regardless of what that assignment results in.
Not the awk
explanation since you already have more than one good answer, but alternatives for the same task
$ seq 6 | xargs -n3
1 2 3
4 5 6
$ seq 6 | paste - - -
1 2 3
4 5 6
with paste
default delimiter is tab, which you can change to space with -d' '
$ seq 6 | pr -3ats' '
1 2 3
4 5 6
If the record count is a multiple of 3 (NR%3 == 0), 0 is treated as false then the output record separator is set to the default record separator (RS), which is newline.
If the record count is not a multiple of 3 (NR%3 != 0) none 0 values are treated as true, then the output record separator is set to the default field separator (FS) which is space.
If this condition is met (which it always is since you're doing an assignment) then do the default action, which is to print the record.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With