In GNU Awk's 4.1.2 Record Splitting with gawk we can read:
When
RSis a single character,RTcontains the same single character. However, whenRSis a regular expression,RTcontains the actual input text that matched the regular expression.
This variable RT is very useful in some cases.
Similarly, we can set a regular expression as the field separator. For example, in here we allow it to be either ";" or "|":
$ gawk -F';' '{print NF}' <<< "hello;how|are you"
2 # there are 2 fields, since ";" appears once
$ gawk -F'[;|]' '{print NF}' <<< "hello;how|are you"
3 # there are 3 fields, since ";" appears once and "|" also once
However, if we want to pack the data again, we don't have a way to know which separator appeared between two fields. So if in the previous example I want to loop through the fields and print them together again by using FS, it prints the whole expression in every case:
$ gawk -F'[;|]' '{for (i=1;i<=NF;i++) printf ("%s%s", $i, FS)}' <<< "hello;how|are you"
hello[;|]how[;|]are you[;|] # a literal "[;|]" shows in the place of FS
Is there a way to "repack" the fields using the specific field separator used to split each one of them, similarly to what RT would allow to do?
(the examples given in the question are rather simple, but just to show the point)
Is there a way to "repack" the fields using the specific field separator used to split each one of them
Using gnu-awk split() that has an extra 4th parameter for the matched delimiter using supplied regex:
s="hello;how|are you"
awk 'split($0, flds, /[;|]/, seps) {for (i=1; i in seps; i++) printf "%s%s", flds[i], seps[i]; print flds[i]}' <<< "$s"
hello;how|are you
A more readable version:
s="hello;how|are you"
awk 'split($0, flds, /[;|]/, seps) {
for (i=1; i in seps; i++)
printf "%s%s", flds[i], seps[i]
print flds[i]
}' <<< "$s"
Take note of 4th seps parameter in split that stores an array of matched text by regular expression used in 3rd parameter i.e. /[;|]/.
Of course it is not as short & simple as RS, ORS and RT, which can be written as:
awk -v RS='[;|]' '{ORS = RT} 1' <<< "$s"
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With