Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

AWK: Is there a way to set OFS as FS if this one is a regex?

Tags:

regex

awk

In awk, the field (or record) separator FS (or RS) can be set as a regular expression. It works great for getting any individual field, but once you set one these fields, the field seperators are "gone".

echo "a|b-c|d" | awk 'BEGIN{FS="[|-]"} {$3="z"}1'
a b z d 

In this case the output field separator OFS is per default set as a space.

Unfortunately this kind of statement OFS=FS="[|-]" is not working, because it sets OFS as a litteral string.

I understand that it might get tricky for awk to select the output field separator if there are several choices, but in case of no new fields, the current ones could remain.

So, is there an easy way to set OFS to be the exact same regex as FS, such that I get this?

echo "a|b-c|d" | awk '... {$3="z"}1'
a|b-z|d

Alternatively, is there a way to capture all separators, in a array for example?

The same question also applies to the record separator RS (and its associated ORS)

like image 387
oliv Avatar asked Sep 05 '16 07:09

oliv


People also ask

How do you define FS in awk?

You can define a field separator by using the "-F" switch under the command line or within two brackets with "FS=...". Above the field, boundaries are set by ":" so we have two fields $1 which is "1" and $2 which is the empty space.

How do I use OFS awk?

awk Built-in Variables OFS - Output Field SeparatorThis variable is used to set the output field separator which is a space by default. Assigning $1 to $1 in $1=$1 modifies a field ( $1 in this case) and that results in awk rebuilding the record $0 . Rebuilding the record replaces the delimiters FS with OFS .

Can we use regular expressions with awk command?

In awk, regular expressions (regex) allow for dynamic and complex pattern definitions. You're not limited to searching for simple strings but also patterns within patterns.

What is awk default field separator?

The default value of the field separator FS is a string containing a single space, " " . If awk interpreted this value in the usual way, each space character would separate fields, so two spaces in a row would make an empty field between them.


2 Answers

As you already mentioned, there is no way to set OFS dynamically based on the FS that was used on every case. If the regex was in RS instead of FS, you could use RT (in fact, I just see anubhava's answer does this, nice!).

However, there is another way if you have GNU awk: as seen in column replacement with awk, with retaining the format (Ed Morton's answer), you can use split() and, specially, its 4th argument. Why? Because it stores the separator between every slice:

gawk 'BEGIN{FS="[|-]"}                     # set FS
     {split($0, a, FS, seps)               # split based on FS and ...
                                           # ...  store pieces in the array seps()
      a[3]="z"                             # change the 3rd field
      for (i=1;i<=NF;i++)                  # print the data back
           printf "%s%s", a[i], seps[i]    # keeping the separators
      print ""                             # print a new line
     }'

As one-liner:

$ gawk 'BEGIN{FS="[|-]"} {split($0, a, FS, seps); a[3]="z"; for (i=1;i<=NF;i++) printf "%s%s", a[i], seps[i]; print ""}' <<< "a|b-c|d"
a|b-z|d

split(string, array [, fieldsep [, seps ] ])

Divide string into pieces separated by fieldsep and store the pieces in array and the separator strings in the seps array. The first piece is stored in array1, the second piece in array2, and so forth. The string value of the third argument, fieldsep, is a regexp describing where to split string (much as FS can be a regexp describing where to split input records). If fieldsep is omitted, the value of FS is used. split() returns the number of elements created. seps is a gawk extension, with seps[i] being the separator string between array[i] and array[i+1]. If fieldsep is a single space, then any leading whitespace goes into seps[0] and any trailing whitespace goes into seps[n], where n is the return value of split() (i.e., the number of elements in array).

like image 190
fedorqui 'SO stop harming' Avatar answered Oct 20 '22 00:10

fedorqui 'SO stop harming'


awk rewrites each record using OFS if you change any filed value using $N=<whatever> (where N is field number).

Since you're using multiple delimiters in FS you cannot use OFS=FS.

If you have gnu awk then you can use RS and RT based solution:

s="a|b-c|d"
awk -v RS='[-|]' 'NR==3{$0="z"} {printf "%s%s", $0, RT}' <<< "$s"

a|b-z|d

Alternatively you can use sed:

s="a|b-c|d"
sed -E 's/^(([^|-]+[|-]){2})[^|-]+/\1z/' <<< "$s"

a|b-z|d
like image 41
anubhava Avatar answered Oct 19 '22 22:10

anubhava