Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regex "^[[:digit:]]$" not working as expected in AWK/GAWK

My GAWK version on RHEL is:

gawk-3.1.5-15.el5

I wanted to print a line if the first field of it has all digits (no special characters, even space to be considered)

Example:

echo "123456789012345,3" | awk -F, '{if ($1 ~ /^[[:digit:]]$/)  print $0}'

Output:
Nothing

Expected Output:
123456789012345,3

What is going wrong here ? Does my AWK version not understand the GNU character classes ? Kindly help

like image 475
dig_123 Avatar asked Dec 23 '22 22:12

dig_123


1 Answers

To match multiple digits in the the [[:digit:]] character class add a +, which means match one or more number of digits in $1.

echo "123456789012345,3" | awk -F, '{if ($1 ~ /^([[:digit:]]+)$/)  print $0}'
123456789012345,3

which satisfies your requirement.

A more idiomatic way ( as suggested from the comments) would be to drop the print and involve the direct match on the line and print it,

echo "123456789012345,3" | awk -F, '$1 ~ /^([[:digit:]]+)$/'
123456789012345,3

Some more examples which demonstrate the same,

echo "a1,3" | awk -F, '$1 ~ /^([[:digit:]]+)$/'

(and)

echo "aa,3" | awk -F, '$1 ~ /^([[:digit:]]+)$/'

do NOT produce any output a per the requirement.

Another POSIX compliant way to do strict length checking of digits can be achieved with something like below, where {3} denotes the match length.

echo "123,3" |  awk --posix -F, '$1 ~ /^[0-9]{3}$/'
123,3

(and)

echo "12,3" |  awk --posix -F, '$1 ~ /^[0-9]{3}$/'

does not produce any output.

If you are using a relatively newer version of bash shell, it supports a native regEx operator with the ~ using POSIX character classes as above, something like

#!/bin/bash

while IFS=',' read -r row1 row2
do
   [[ $row1 =~ ^([[:digit:]]+)$ ]] && printf "%s,%s\n" "$row1" "$row2"
done < file

For an input file say file

$ cat file
122,12
a1,22
aa,12

The script produces,

$ bash script.sh
122,12

Although this works, bash regEx can be slower a relatively straight-forward way using string manipulation would be something like

while IFS=',' read -r row1 row2
do
   [[ -z "${row1//[0-9]/}" ]] && printf "%s,%s\n" "$row1" "$row2"
done < file

The "${row1//[0-9]/}" strips all the digits from the row and the condition becomes true only if there are no other characters left in the variable.

like image 91
Inian Avatar answered Jan 29 '23 01:01

Inian