Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can RS be set "empty" to split string characters to records?

Tags:

awk

gawk

Is there a way in awk—gawk most likely—to set the record separator RS to empty value to process each character of a string as a separate record? Kind of like setting the FS to empty to separate each character in its own field:

$ echo abc | awk -F '' '{print $2}'
b

but to separate them each as a separate record, like:

$ echo abc | awk -v RS='?' '{print $0}'
a
b
c

The most obvious one:

$ echo abc | awk -v RS=''  '{print $0}'
abc

didn't award me (as that one was apparently meant for something else per GNU awk documentation).

Am I basically stuck using for etc.?

EDIT:

@xhienne's answer was what I was looking for but even using that (20 chars and a questionable variable A :):

$ echo  abc | awk -v A="\n" -v RS='(.)' -v ORS="" '{print(RT==A?NR:RT)}'
abc4

wouldn't help me shorten my earlier code using length. Then again, how could I win the Pyth code: +Qfql+Q :D.

like image 482
James Brown Avatar asked Dec 23 '16 10:12

James Brown


1 Answers

If you just want to print one character per line, @klashxx's answer is OK. But a sed 's/./&\n/g' would be shorter since you are golfing.

If you truly want a separate record for each character, the best approaching solution I have found for you is:

echo -n abc | awk -v RS='(.)' '{ print RT }'

(use gawk; your input character is in RT, not $1)

[update] If RS is set to the null string, it means to awk that records are separated by blank lines. If I had just defined RS='.', the record separator would have been a mere dot (i.e. a fixed string). But if its length is more than one character, one feature of gawk is to consider RS as a regex. So, what I did here is to give gawk a regex meaning "each character" as a record separator. And I use another feature of gawk: to retrieve the string that matched the regex in the special variable RT (record terminator)

Here is the relevant parts of the gwak manual:

Normally, records are separated by newline characters. You can control how records are separated by assigning values to the built-in variable RS. If RS is any single character, that character separates records. Otherwise, RS is a regular expression. Text in the input that matches this regular expression separates the record.

If RS is set to the null string, then records are separated by blank lines.

Gawk sets RT to the input text that matched the character or regular expression specified by RS.

like image 176
xhienne Avatar answered Nov 22 '22 12:11

xhienne