In awk, the field (or record) separator <code>FS</code> (or <code>RS</code>) can be set as a regular expression. It works great for getting any individual field, but once you set one these fields, the field seperators are "gone". <pre class="prettyprint"><code>echo "a|b-c|d" | awk 'BEGIN{FS="[|-]"} {$3="z"}1' a b z d </code></pre> In this case the output field separator <code>OFS</code> is per default set as a space. Unfortunately this kind of statement <code>OFS=FS="[|-]"</code> is not working, because it sets <code>OFS</code> as a litteral string. I understand that it might get tricky for awk to select the output field separator if there are several choices, but in case of no new fields, the current ones could remain. So, is there an easy way to set <code>OFS</code> to be the exact same regex as <code>FS</code>, such that I get this? <pre class="prettyprint"><code>echo "a|b-c|d" | awk '... {$3="z"}1' a|b-z|d </code></pre> Alternatively, is there a way to capture all separators, in a array for example? The same question also applies to the record separator <code>RS</code> (and its associated <code>ORS</code>)

As you already mentioned, there is no way to set <code>OFS</code> dynamically based on the <code>FS</code> that was used on every case. If the regex was in <code>RS</code> instead of <code>FS</code>, you could use <code>RT</code> (in fact, I just see anubhava's answer does this, nice!). However, there is another way if you have GNU awk: as seen in column replacement with awk, with retaining the format (Ed Morton's answer), you can use <code>split()</code> and, specially, its 4th argument. Why? Because it stores the separator between every slice: <pre class="prettyprint"><code>gawk 'BEGIN{FS="[|-]"} # set FS {split($0, a, FS, seps) # split based on FS and ... # ... store pieces in the array seps() a[3]="z" # change the 3rd field for (i=1;i<=NF;i++) # print the data back printf "%s%s", a[i], seps[i] # keeping the separators print "" # print a new line }' </code></pre> As one-liner: <pre class="prettyprint"><code>$ gawk 'BEGIN{FS="[|-]"} {split($0, a, FS, seps); a[3]="z"; for (i=1;i<=NF;i++) printf "%s%s", a[i], seps[i]; print ""}' <<< "a|b-c|d" a|b-z|d </code></pre> <blockquote> split(string, array [, fieldsep [, seps ] ]) Divide string into pieces separated by fieldsep and store the pieces in array and the separator strings in the seps array. The first piece is stored in array1, the second piece in array2, and so forth. The string value of the third argument, fieldsep, is a regexp describing where to split string (much as FS can be a regexp describing where to split input records). If fieldsep is omitted, the value of FS is used. split() returns the number of elements created. seps is a gawk extension, with seps[i] being the separator string between array[i] and array[i+1]. If fieldsep is a single space, then any leading whitespace goes into seps[0] and any trailing whitespace goes into seps[n], where n is the return value of split() (i.e., the number of elements in array). </blockquote>

<code>awk</code> rewrites each record using <code>OFS</code> if you change any filed value using <code>$N=<whatever></code> (where N is field number). Since you're using multiple delimiters in <code>FS</code> you cannot use <code>OFS=FS</code>. If you have <code>gnu awk</code> then you can use <code>RS</code> and <code>RT</code> based solution: <pre class="prettyprint"><code>s="a|b-c|d" awk -v RS='[-|]' 'NR==3{$0="z"} {printf "%s%s", $0, RT}' <<< "$s" a|b-z|d </code></pre> Alternatively you can use <code>sed</code>: <pre class="prettyprint"><code>s="a|b-c|d" sed -E 's/^(([^|-]+[|-]){2})[^|-]+/\1z/' <<< "$s" a|b-z|d </code></pre>

AWK: Is there a way to set OFS as FS if this one is a regex?

Tags:

regex

awk

In awk, the field (or record) separator FS (or RS) can be set as a regular expression. It works great for getting any individual field, but once you set one these fields, the field seperators are "gone".

echo "a|b-c|d" | awk 'BEGIN{FS="[|-]"} {$3="z"}1'
a b z d

In this case the output field separator OFS is per default set as a space.

Unfortunately this kind of statement OFS=FS="[|-]" is not working, because it sets OFS as a litteral string.

I understand that it might get tricky for awk to select the output field separator if there are several choices, but in case of no new fields, the current ones could remain.

So, is there an easy way to set OFS to be the exact same regex as FS, such that I get this?

echo "a|b-c|d" | awk '... {$3="z"}1'
a|b-z|d

Alternatively, is there a way to capture all separators, in a array for example?

The same question also applies to the record separator RS (and its associated ORS)

387

asked Sep 05 '16 07:09

oliv

2 Answers

As you already mentioned, there is no way to set OFS dynamically based on the FS that was used on every case. If the regex was in RS instead of FS, you could use RT (in fact, I just see anubhava's answer does this, nice!).

However, there is another way if you have GNU awk: as seen in column replacement with awk, with retaining the format (Ed Morton's answer), you can use split() and, specially, its 4th argument. Why? Because it stores the separator between every slice:

gawk 'BEGIN{FS="[|-]"}                     # set FS
     {split($0, a, FS, seps)               # split based on FS and ...
                                           # ...  store pieces in the array seps()
      a[3]="z"                             # change the 3rd field
      for (i=1;i<=NF;i++)                  # print the data back
           printf "%s%s", a[i], seps[i]    # keeping the separators
      print ""                             # print a new line
     }'

As one-liner:

$ gawk 'BEGIN{FS="[|-]"} {split($0, a, FS, seps); a[3]="z"; for (i=1;i<=NF;i++) printf "%s%s", a[i], seps[i]; print ""}' <<< "a|b-c|d"
a|b-z|d

split(string, array [, fieldsep [, seps ] ])

Divide string into pieces separated by fieldsep and store the pieces in array and the separator strings in the seps array. The first piece is stored in array1, the second piece in array2, and so forth. The string value of the third argument, fieldsep, is a regexp describing where to split string (much as FS can be a regexp describing where to split input records). If fieldsep is omitted, the value of FS is used. split() returns the number of elements created. seps is a gawk extension, with seps[i] being the separator string between array[i] and array[i+1]. If fieldsep is a single space, then any leading whitespace goes into seps[0] and any trailing whitespace goes into seps[n], where n is the return value of split() (i.e., the number of elements in array).

190

answered Oct 20 '22 00:10

fedorqui 'SO stop harming'

awk rewrites each record using OFS if you change any filed value using $N=<whatever> (where N is field number).

Since you're using multiple delimiters in FS you cannot use OFS=FS.

If you have gnu awk then you can use RS and RT based solution:

s="a|b-c|d"
awk -v RS='[-|]' 'NR==3{$0="z"} {printf "%s%s", $0, RT}' <<< "$s"

a|b-z|d

Alternatively you can use sed:

s="a|b-c|d"
sed -E 's/^(([^|-]+[|-]){2})[^|-]+/\1z/' <<< "$s"

a|b-z|d

answered Oct 19 '22 22:10

anubhava

Related questions
                            
                                awk field separator with regexp lookahead or lookbehind
                            
                                .htaccess only allow access to index.php and a directory
                            
                                Why is this regex matching also words within a non-capturing group?
                            
                                PHP RegExp for url string
                            
                                Search filenames with regex
                            
                                Use lapply to plot data in a list and use names of list elements as plot titles [duplicate]
                            
                                Regex to replace character with character itself and hyphen
                            
                                How to find whether specific number of continuous consecutive numbers are contains in a string using javascript?
                            
                                Explode string when not between ()
                            
                                Why does using .html() break this Replace expression?
                            
                                converting the data with regexp in oracle sql
                            
                                Regex extraction data before vs after comma in R
                            
                                create regex to match format of 00:00:00 for duration (not time)
                            
                                Escaping special characters for JSON output
                            
                                Why this code stuck node.js - Bug on Javascript?
                            
                                Splitting rows with uneven string length into columns in R using tidyr [duplicate]
                            
                                What is @(...|...|...) syntax in bash?
                            
                                Extracting a number following specific text in R
                            
                                `scan': invalid byte sequence in UTF-8 (ArgumentError)
                            
                                How To Remove Single Blank Line Only - Keep Multiple Blank Lines

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With