Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there a field that stores the exact field separator FS used when in a regular expression, equivalent to RT for RS?

Tags:

awk

gnu

In GNU Awk's 4.1.2 Record Splitting with gawk we can read:

When RS is a single character, RT contains the same single character. However, when RS is a regular expression, RT contains the actual input text that matched the regular expression.

This variable RT is very useful in some cases.

Similarly, we can set a regular expression as the field separator. For example, in here we allow it to be either ";" or "|":

$ gawk -F';' '{print NF}' <<< "hello;how|are you"
2  # there are 2 fields, since ";" appears once
$ gawk -F'[;|]' '{print NF}' <<< "hello;how|are you"
3  # there are 3 fields, since ";" appears once and "|" also once

However, if we want to pack the data again, we don't have a way to know which separator appeared between two fields. So if in the previous example I want to loop through the fields and print them together again by using FS, it prints the whole expression in every case:

$ gawk -F'[;|]' '{for (i=1;i<=NF;i++) printf ("%s%s", $i, FS)}' <<< "hello;how|are you"
hello[;|]how[;|]are you[;|]  # a literal "[;|]" shows in the place of FS

Is there a way to "repack" the fields using the specific field separator used to split each one of them, similarly to what RT would allow to do?

(the examples given in the question are rather simple, but just to show the point)

like image 512
fedorqui 'SO stop harming' Avatar asked Jan 04 '21 09:01

fedorqui 'SO stop harming'


1 Answers

Is there a way to "repack" the fields using the specific field separator used to split each one of them

Using gnu-awk split() that has an extra 4th parameter for the matched delimiter using supplied regex:

s="hello;how|are you"
awk 'split($0, flds, /[;|]/, seps) {for (i=1; i in seps; i++) printf "%s%s", flds[i], seps[i]; print flds[i]}' <<< "$s"

hello;how|are you

A more readable version:

s="hello;how|are you"
awk 'split($0, flds, /[;|]/, seps) {
   for (i=1; i in seps; i++)
      printf "%s%s", flds[i], seps[i]
   print flds[i]
}' <<< "$s"

Take note of 4th seps parameter in split that stores an array of matched text by regular expression used in 3rd parameter i.e. /[;|]/.

Of course it is not as short & simple as RS, ORS and RT, which can be written as:

awk -v RS='[;|]' '{ORS = RT} 1' <<< "$s"
like image 65
anubhava Avatar answered Sep 28 '22 05:09

anubhava