Background:
I have a column that should get user input in form of "Description text ref12345678". I have existing scripts that grab the reference number but unfortunately some users add it incorrectly so instead of "ref12345678"
it can be "ref 12345678"
, "RF12345678"
, "abcd12345678"
or any variation. Naturally the wrong formatting breaks some of the triggered scripts. For now I can't control the user input to this field, so I want to make the scripts later in the pipeline just to get the number.
At the moment I'm stripping the letters with awk '{gsub(/[[:alpha:]]/, "")}; 1'
, but substitution seems like an inefficient solution. (I know I can do this also with sed -n 's/.*[a-zA-Z]//p'
and tr -d '[[:alpha:]]'
but they are essentially the same and I want awk for additional programmability).
The question is, is there a way to set awk to either print only numbers from a string, or set delimits to numeric items in a string? (or is substitution really the most efficient solution for this problem).
So in summary: how do I use awk for $ echo "ref12345678"
to print only "12345678" without substitution?
Extract Number From String – Bash Examples The first method we will look at to extract a number is to use the tr command. Another method is to use the sed command. Note that the examples above will also work when extracting multiple numbers from a string, but they will combine the numbers.
The AWK language is useful for manipulation of data files, text retrieval and processing. -F <value> - tells awk what field separator to use. In your case, -F: means that the separator is : (colon). '{print $4}' means print the fourth field (the fields being separated by : ).
If you notice awk 'print $1' prints first word of each line. If you use $3, it will print 3rd word of each line.
The awk implementation of cut uses the getopt() library function (see Processing Command-Line Options) and the join() library function (see Merging an Array into a String). The current POSIX version of cut has options to cut fields based on both bytes and characters.
if awk is not a must:
grep -o '[0-9]\+'
example:
kent$ echo "ref12345678"|grep -o '[0-9]\+' 12345678
with awk for your example:
kent$ echo "ref12345678"|awk -F'[^0-9]*' '$0=$2' 12345678
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With