In GNU Awk's 4.1.2 Record Splitting with gawk
we can read:
When
RS
is a single character,RT
contains the same single character. However, whenRS
is a regular expression,RT
contains the actual input text that matched the regular expression.
This variable RT
is very useful in some cases.
Similarly, we can set a regular expression as the field separator. For example, in here we allow it to be either ";" or "|":
$ gawk -F';' '{print NF}' <<< "hello;how|are you"
2 # there are 2 fields, since ";" appears once
$ gawk -F'[;|]' '{print NF}' <<< "hello;how|are you"
3 # there are 3 fields, since ";" appears once and "|" also once
However, if we want to pack the data again, we don't have a way to know which separator appeared between two fields. So if in the previous example I want to loop through the fields and print them together again by using FS
, it prints the whole expression in every case:
$ gawk -F'[;|]' '{for (i=1;i<=NF;i++) printf ("%s%s", $i, FS)}' <<< "hello;how|are you"
hello[;|]how[;|]are you[;|] # a literal "[;|]" shows in the place of FS
Is there a way to "repack" the fields using the specific field separator used to split each one of them, similarly to what RT would allow to do?
(the examples given in the question are rather simple, but just to show the point)
Is there a way to "repack" the fields using the specific field separator used to split each one of them
Using gnu-awk
split()
that has an extra 4th parameter for the matched delimiter using supplied regex:
s="hello;how|are you"
awk 'split($0, flds, /[;|]/, seps) {for (i=1; i in seps; i++) printf "%s%s", flds[i], seps[i]; print flds[i]}' <<< "$s"
hello;how|are you
A more readable version:
s="hello;how|are you"
awk 'split($0, flds, /[;|]/, seps) {
for (i=1; i in seps; i++)
printf "%s%s", flds[i], seps[i]
print flds[i]
}' <<< "$s"
Take note of 4th seps
parameter in split
that stores an array of matched text by regular expression used in 3rd parameter i.e. /[;|]/
.
Of course it is not as short & simple as RS
, ORS
and RT
, which can be written as:
awk -v RS='[;|]' '{ORS = RT} 1' <<< "$s"
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With