I am trying to split of letter and number boundaries, but the solution with lookarounds fails:
echo 50cats30dogs100squirrels | awk '{split($0,a,/(?<=\D)(.*)(?=\d)/); print a[1],a[2],a[3]}'
awk: illegal primary in regular expression (?<=\D)(.*)(?=\d) at <=\D)(.*)(?=\d)
source line number 1
context is
>>> {split($0,a,/(?<=\D)(.*)(?=\d)/) <<<
Is there a way to do this in Awk in other way?
Edit:
Sorry for not being clear. The expected output is to just add spaces like this:
50 cats 30 dogs 100 squirrels
The awk function split(s,a,sep) splits a string s into an awk array a using the delimiter sep. Variable hms is an array so hms[2] is 34 . The last three statements are equivalent, but the last two more convenient for longer arrays. In the second you can specify the start index and number of elements to print.
The field separator can be either a single character or a regular expression. It controls the way awk splits an input record into the fields. By default, awk uses both space and tab characters as the field separator. You can tell awk how fields are separated using the -F option on the command line.
Before splitting the string, patsplit() deletes any previously existing elements in the arrays array and seps . Divide string into pieces separated by fieldsep and store the pieces in array and the separator strings in the seps array.
With your shown samples only. Could you please try following, if this is what you are looking for. Written and tested in GNU awk
(should work in any awk
I believe).
echo "50cats30dogs100squirrels" | awk '{gsub(/[^0-9]+/," & ")} 1'
Output will be as follows for shown samples:
50 cats 30 dogs 100 squirrels
Is there a way to do this in Awk in other way?
I would use GNU AWK
for this task following way, let file.txt
content be
50cats30dogs100squirrels
then
awk 'BEGIN{FPAT="([[:alpha:]]+)|([[:digit:]]+)"}{$1=$1;print}' file.txt
output
50 cats 30 dogs 100 squirrels
Explanation: I instruct AWK that column is (one or more letters) or (one or more digits) using FPAT
. Then I do $1=$1
to cause string rebuilt (without $1=$1;
output would be same as input) and print
it.
(tested in gawk 4.2.1)
You could try this:
echo 50cats30dogs100squirrels | awk '{while (match($0, /[0-9]+|[a-zA-Z]+/)) {print substr($0, RSTART, RLENGTH);$0=substr($0, RSTART+RLENGTH)}}'
Which yields:
50
cats
30
dogs
100
squirrels
(?<=\D)(.*)(?=\d)
is a PCRE. No mandatory Unix tools as defined by the POSIX standard support PCREs. awk in particular supports EREs.
With GNU awk for FPAT:
$ echo '50cats30dogs100squirrels' | awk -v FPAT='[0-9]+|[^0-9]+' '{$1=$1}1'
50 cats 30 dogs 100 squirrels
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With