Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to split a delimited string into an array in awk?

People also ask

How do I split text in awk?

The awk function split(s,a,sep) splits a string s into an awk array a using the delimiter sep. Variable hms is an array so hms[2] is 34 . The last three statements are equivalent, but the last two more convenient for longer arrays. In the second you can specify the start index and number of elements to print.

Which function in AWK is used to divide a string into pieces separated by the field separator and store the pieces in an array?

Before splitting the string, patsplit() deletes any previously existing elements in the arrays array and seps . Divide string into pieces separated by fieldsep and store the pieces in array and the separator strings in the seps array.

How do you split fields in awk?

The field separator, which is either a single character or a regular expression, controls the way awk splits an input record into fields. awk scans the input record for character sequences that match the separator; the fields themselves are the text between the matches.

Can AWK use multiple field separators?

As you can see, you can combine more than one delimiter in the AWK field separator to get specific information.


Have you tried:

echo "12|23|11" | awk '{split($0,a,"|"); print a[3],a[2],a[1]}'

To split a string to an array in awk we use the function split():

 awk '{split($0, a, ":")}'
 #           ^^  ^  ^^^
 #            |  |   |
 #       string  |   delimiter
 #               |
 #               array to store the pieces

If no separator is given, it uses the FS, which defaults to the space:

$ awk '{split($0, a); print a[2]}' <<< "a:b c:d e"
c:d

We can give a separator, for example ::

$ awk '{split($0, a, ":"); print a[2]}' <<< "a:b c:d e"
b c

Which is equivalent to setting it through the FS:

$ awk -F: '{split($0, a); print a[1]}' <<< "a:b c:d e"
b c

In gawk you can also provide the separator as a regexp:

$ awk '{split($0, a, ":*"); print a[2]}' <<< "a:::b c::d e" #note multiple :
b c

And even see what the delimiter was on every step by using its fourth parameter:

$ awk '{split($0, a, ":*", sep); print a[2]; print sep[1]}' <<< "a:::b c::d e"
b c
:::

Let's quote the man page of GNU awk:

split(string, array [, fieldsep [, seps ] ])

Divide string into pieces separated by fieldsep and store the pieces in array and the separator strings in the seps array. The first piece is stored in array[1], the second piece in array[2], and so forth. The string value of the third argument, fieldsep, is a regexp describing where to split string (much as FS can be a regexp describing where to split input records). If fieldsep is omitted, the value of FS is used. split() returns the number of elements created. seps is a gawk extension, with seps[i] being the separator string between array[i] and array[i+1]. If fieldsep is a single space, then any leading whitespace goes into seps[0] and any trailing whitespace goes into seps[n], where n is the return value of split() (i.e., the number of elements in array).


Please be more specific! What do you mean by "it doesn't work"? Post the exact output (or error message), your OS and awk version:

% awk -F\| '{
  for (i = 0; ++i <= NF;)
    print i, $i
  }' <<<'12|23|11'
1 12
2 23
3 11

Or, using split:

% awk '{
  n = split($0, t, "|")
  for (i = 0; ++i <= n;)
    print i, t[i]
  }' <<<'12|23|11'
1 12
2 23
3 11

Edit: on Solaris you'll need to use the POSIX awk (/usr/xpg4/bin/awk) in order to process 4000 fields correctly.


I do not like the echo "..." | awk ... solution as it calls unnecessary fork and execsystem calls.

I prefer a Dimitre's solution with a little twist

awk -F\| '{print $3 $2 $1}' <<<'12|23|11'

Or a bit shorter version:

awk -F\| '$0=$3 $2 $1' <<<'12|23|11'

In this case the output record put together which is a true condition, so it gets printed.

In this specific case the stdin redirection can be spared with setting an awk internal variable:

awk -v T='12|23|11' 'BEGIN{split(T,a,"|");print a[3] a[2] a[1]}'

I used ksh quite a while, but in bash this could be managed by internal string manipulation. In the first case the original string is split by internal terminator. In the second case it is assumed that the string always contains digit pairs separated by a one character separator.

T='12|23|11';echo -n ${T##*|};T=${T%|*};echo ${T#*|}${T%|*}
T='12|23|11';echo ${T:6}${T:3:2}${T:0:2}

The result in all cases is

112312

Actually awk has a feature called 'Input Field Separator Variable' link. This is how to use it. It's not really an array, but it uses the internal $ variables. For splitting a simple string it is easier.

echo "12|23|11" | awk 'BEGIN {FS="|";} { print $1, $2, $3 }'

I know this is kind of old question, but I thought maybe someone like my trick. Especially since this solution not limited to a specific number of items.

# Convert to an array
_ITEMS=($(echo "12|23|11" | tr '|' '\n'))

# Output array items
for _ITEM in "${_ITEMS[@]}"; do
  echo "Item: ${_ITEM}"
done

The output will be:

Item: 12
Item: 23
Item: 11