I have a file foo
that has the following data:
A<|>B<|>C<|>D
1<|>2<|>3<|>4
I want to properly access each column using awk, but it isn't properly interpreting the field separator.
When I run:
head foo | \
awk 'BEGIN {FS="<|>"} {out=""; for(i=1;i<=NF;i++){out=out" "$i}; print out}'
instead of printing
A B C D
1 2 3 4
it prints
A | B | C | D
1 | 2 | 3 | 4
What's the reason behind this?
The pipe is a special character in a regex, so you need to escape it with a backslash. But this backslash is also a special character for the string literal, so it needs to be escaped again. So you end up with the following:
awk -F '<\\|>' '{$1=$1}1'
awk 'BEGIN {FS="<\\|>"} {$1=$1}1'
The reason for this syntax is explained quite well here: http://www.gnu.org/software/gawk/manual/gawk.html#Computed-Regexps. In short, the expression is parsed twice.
Awk reads your separator as a regex, "<
or >
". You have to escape the pipe character (twice, seeing that dynamic regexps such as the field separator are scanned twice): "<\\|>"
.
You can specify the field separator also as a parameter:
awk -F '<\\|>' '{out=""; for(i=1;i<=NF;i++){out=out" "$i}; print out}' <<< 'A<|>B<|>C<|>D'
A B C D
Depending on your version of awk, you might get away with just single escaping. For me, mawk 1.3.3 works with both -F '<\|>'
and -F '<\\|>'
, and gawk 4.0.1 requires -F '<\\|>'
. I'm not fully sure which way POSIX awk goes, but running gawk in --posix
mode requires the double escapes, too.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With