gawk FS to split record into individual characters

Tags:

gawk

If the field separator is the empty string, each character becomes a separate field

$ echo hello | awk -F '' -v OFS=, '{$1 = NF OFS $1} 1'
5,h,e,l,l,o

However, if FS is a regex that can possibly match zero times, the same behaviour does not occur:

$ echo hello | awk -F ' *' -v OFS=, '{$1 = NF OFS $1} 1'
1,hello

Anyone know why that is? I could not find anything in the gawk manual. Is FS="" just a special case?

I'm most interested in understanding why the 2nd case does not split the record into more fields. It's as if awk is treating FS=" *" like FS=" +"

983

asked Feb 26 '14 14:02

1 Answers

Interesting question!

I just pulled gnu-awk 4.1.0's codes, I think the answer we could find in the file field.c.

line 371:
 * re_parse_field --- parse fields using a regexp.
 *
 * This is called both from get_field() and from do_split()
 * via (*parse_field)().  This variation is for when FS is a regular
 * expression -- either user-defined or because RS=="" and FS==" "
 */
static long
re_parse_field(lo...

also this line: (line 425):

if (REEND(rp, scan) == RESTART(rp, scan)) {   /* null match */

here is the case of <space>* matching in your question. The implementation didn't increment the nf, that is, it thinks the whole line is one single field. Note this function was used in do_split() function too.

First, if FS is null string, gawk separates each char into its own field. gawk's doc has clearly written this, also in codes, we could see:

line 613:
 * null_parse_field --- each character is a separate field
 *
 * This is called both from get_field() and from do_split()
 * via (*parse_field)().  This variation is for when FS is the null string.
 */
static long
null_parse_field(long up_to,

If the FS has single character, awk won't consider it as regex. This was mentioned in doc too. Also in codes:

#line 667
 * sc_parse_field --- single character field separator
 *
 * This is called both from get_field() and from do_split()
 * via (*parse_field)().  This variation is for when FS is a single character
 * other than space.
 */
static long
sc_parse_field(l

if we read the function, no regex match handling was done there.

In the comments of the function re_parse_field(), and sc_parse_field(), we see do_split invokes them too. It explains why we have 1 in following command instead of 3:

kent$  echo "foo"|awk '{split($0,a,/ */);print length(a)}'
1

Note, to avoid to make the post too long, I didn't paste the complete codes here, we can find the codes here:

http://git.savannah.gnu.org/cgit/gawk.git/

answered Oct 02 '22 18:10

Kent

Related questions
                            
                                Can I pass an array to awk using -v?
                            
                                Regex a string with unknown number of parameters
                            
                                Google API to find the search count [closed]
                            
                                Command-line program to update R Markdown code to use `$latex` delimter
                            
                                Extract a specific pattern from lines with sed, awk or perl
                            
                                awk FPAT variable: Working
                            
                                awk/sed/grep to delete lines matching fields in another file
                            
                                Use grep to print only the context
                            
                                grep lines that start with a specific string
                            
                                Awk reverse both lines and words
                            
                                grep regex to perl or awk
                            
                                How to do such substitutions with AWK or sed or Perl?
                            
                                Obtain patterns from a file, compare to a column of another file, print matching lines, using awk
                            
                                How to sort groups of lines together?
                            
                                easy way to change the uniq -c output?
                            
                                Creating multiple csv files from data within a csv file
                            
                                how to get rid of awk fatal division by zero error
                            
                                replace space only between parentheses
                            
                                Passing variables into awk from bash
                            
                                awk: calling a function outside of awk

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

gawk FS to split record into individual characters

Tags:

awk

gawk

glenn jackman

People also ask

1 Answers

Kent

Recent Activity

Donate For Us