Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Awk split string into words and numbers

Tags:

awk

I am trying to split of letter and number boundaries, but the solution with lookarounds fails:

echo 50cats30dogs100squirrels | awk '{split($0,a,/(?<=\D)(.*)(?=\d)/); print a[1],a[2],a[3]}'

awk: illegal primary in regular expression (?<=\D)(.*)(?=\d) at <=\D)(.*)(?=\d)
 source line number 1
 context is
     >>> {split($0,a,/(?<=\D)(.*)(?=\d)/) <<<

Is there a way to do this in Awk in other way?

Edit:

Sorry for not being clear. The expected output is to just add spaces like this:

50 cats 30 dogs 100 squirrels
like image 692
Lechu Avatar asked Mar 05 '21 13:03

Lechu


People also ask

How do I split text in awk?

The awk function split(s,a,sep) splits a string s into an awk array a using the delimiter sep. Variable hms is an array so hms[2] is 34 . The last three statements are equivalent, but the last two more convenient for longer arrays. In the second you can specify the start index and number of elements to print.

How do you split fields in awk?

The field separator can be either a single character or a regular expression. It controls the way awk splits an input record into the fields. By default, awk uses both space and tab characters as the field separator. You can tell awk how fields are separated using the -F option on the command line.

Which function in awk is used to divide a string into pieces separated by the field separator and store the pieces in an array?

Before splitting the string, patsplit() deletes any previously existing elements in the arrays array and seps . Divide string into pieces separated by fieldsep and store the pieces in array and the separator strings in the seps array.


Video Answer


4 Answers

With your shown samples only. Could you please try following, if this is what you are looking for. Written and tested in GNU awk(should work in any awk I believe).

echo "50cats30dogs100squirrels" | awk '{gsub(/[^0-9]+/," & ")} 1'

Output will be as follows for shown samples:

50 cats 30 dogs 100 squirrels
like image 124
RavinderSingh13 Avatar answered Oct 21 '22 20:10

RavinderSingh13


Is there a way to do this in Awk in other way?

I would use GNU AWK for this task following way, let file.txt content be

50cats30dogs100squirrels

then

awk 'BEGIN{FPAT="([[:alpha:]]+)|([[:digit:]]+)"}{$1=$1;print}' file.txt

output

50 cats 30 dogs 100 squirrels

Explanation: I instruct AWK that column is (one or more letters) or (one or more digits) using FPAT. Then I do $1=$1 to cause string rebuilt (without $1=$1; output would be same as input) and print it.

(tested in gawk 4.2.1)

like image 5
Daweo Avatar answered Oct 21 '22 21:10

Daweo


You could try this:

echo 50cats30dogs100squirrels | awk '{while (match($0, /[0-9]+|[a-zA-Z]+/)) {print substr($0, RSTART, RLENGTH);$0=substr($0, RSTART+RLENGTH)}}'

Which yields:

50
cats
30
dogs
100
squirrels

like image 1
Nick Mancuso Avatar answered Oct 21 '22 19:10

Nick Mancuso


(?<=\D)(.*)(?=\d) is a PCRE. No mandatory Unix tools as defined by the POSIX standard support PCREs. awk in particular supports EREs.

With GNU awk for FPAT:

$ echo '50cats30dogs100squirrels' | awk -v FPAT='[0-9]+|[^0-9]+' '{$1=$1}1'
50 cats 30 dogs 100 squirrels
like image 3
Ed Morton Avatar answered Oct 21 '22 21:10

Ed Morton