(This is my first post here, so please forgive me if I am asking the question the wrong way.)
I am learning awk
on my OSX Maverick. I am going through this tutorial on awk.
I am trying to reproduce something similar to the awk_example4a.awk in that tutorial.
So I came up with this awk program/script/arguments (not sure what you call it??):
BEGIN { i=1 }
{
print "Line " i;
print "$1 is " $1,"\n$2 is " $2, "\n$3 is " $3;
FS=":";
$0=$0;
print "With the new FS - line " i;
print "$1 is " $1,"\n$2 is " $2, "\n$3 is " $3;
FS=" ";
i++;
}
And the input file looks like this:
A1 B1:B2 C2
A1:A2 B2:B3 C3
What I am trying to do is to process each line/record first with the default FS
(whitespace), and then re-process the same with a new FS
(":"), then restore the default FS
before going to the next record.
According to the tutorial, $0=$0
is supposed to get awk
to re-evaluate the fields using the new field separator, and thus supposedly giving me an output that looks like this:
Line 1
$1 is A1
$2 is B1:B2
$3 is C2
With the new FS - line 1
$1 is A1 B1
$2 is B2 C2
$3 is
Line 2
$1 is A1:A2
$2 is B2:B3
$3 is C3
With the new FS - line 2
$1 is A1
$2 is A2 B2
$3 is B3 C3
But instead, I get:
Line 1
$1 is A1
$2 is B1:B2
$3 is C2
With the new FS - the line 1
$1 is A1
$2 is B1:B2
$3 is C2
Line 2
$1 is A1:A2
$2 is B2:B3
$3 is C3
With the new FS - the line 2
$1 is A1:A2
$2 is B2:B3
$3 is C3
i.e. the fields have not been re-evaluated after the FS
was changed.
So if $0=$0
doesn't work (and nor do things like $1=$1; $2=$2
), how do I get awk to re-evaluate the same line using a different FS
?
Thank you.
FreeBSD/OS X awk
doesn't apply changes to FS
(the field separator) until after the current record has finished processing - this behavior is actually POSIX-mandated (see below).
Workaround: Do not change FS
and use function split()
instead:
{
print "Line " ++i
print "$1 is " $1 "\n$2 is " $2 "\n$3 is " $3
split($0, flds, ":") # split current line by ':' into array `flds`
print "With the new FS - line " i
print "field1 is " flds[1] "\nfield2 is " flds[2] "\nfield3 is " flds[3]
}
BEGIN
block was eliminated by relying on uninitialized variables defaulting to 0
in numeric contexts.,
instances were removed from the print
statements, because each would insert a space (the default value of the output-field separator, OFS
), which is not needed in this case.;
is not needed to terminate them.Read on for the fun multi-platform compatibility details.
The POSIX spec. for awk
states (emphasis mine):
Before the first reference to a field in the record is evaluated, the record shall be split into fields, according to the rules in Regular Expressions, **using the value of FS that was current at the time the record was read**.
With respect to assigning a new value to $0
or a specific field, the same source states:
The symbol $0 shall refer to the entire record; setting any other field causes the re-evaluation of $0. Assigning to $0 shall reset the values of all other fields and the NF built-in variable.
In other words: Given that the re-assignment case doesn't state otherwise, the only reference to the scope of a given FS
value in the POSIX spec. mandates that it be constant for a given input record.
There is definitely ambiguity, and it would certainly help if the spec. resolved that - that said, the conservative and thus safer interpretation is to assume a constant-while-processing-a-given-record FS
.
As such, it is FreeBSD/OS X awk
that is the model citizen, whereas GNU awk
and also mawk
offer more flexibility by NOT playing by the rules and applying FS
changes even to the current record on re-assigning to $0
or any specific field.
Note, however, that GNU awk
(as of v4.1.1) doesn't even change that behavior with the --posix
option, whose express intent is to result in POSIX-compliant behavior.
If I'm reading the POSIX spec. correctly (do tell me whether I am), this should be considered a bug.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With