Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

awk regex how can I match or capture this string

Tags:

regex

bash

awk

How do I match or capture these strings:

So far, have tried these two regex patterns that both achieve the same logical matches:

/file="ZZ([^-]+)-[^"]+\.XML"/ - awk

/(?<=ZZ)\w++/ - not supported in awk

Text to be processed in awk:

file="ZZ12345678-20170101.XML"
file="ZZ87654321-19990101.XML"
file="ZZAA123456-20170101.XML"
file="ZZAA123456-20170101.XML"
file="ZZAA123456A1-20170101.XML"
file="ZZBB654321B2-19990101.XML"
file="ZZCC123456C3-20170101.XML"

The problem match is the one letter and one number A1 , B2 , C3 after the series of numbers. The problem is always the single letter and number after the series of numbers.

file="ZZ12345678-20170101.XML" correctly matches 12345678

file="ZZ87654321-19990101.XML" correctly matches 87654321

file="ZZAA123456-20170101.XML" correctly matches AA123456

file="ZZBB654321-20170101.XML" correctly matches BB654321

file="ZZAA123456A1-20170101.XML" incorrectly matches AA123456A1 target match AA123456

file="ZZBB654321B2-19990101.XML" incorrectly matches BB654321B2 target match BB654321

file="ZZCC123456C3-20170101.XML" incorrectly matches CC123456C3 target match CC123456

Grateful for help and example approaches

like image 414
Gabe Avatar asked Nov 07 '25 09:11

Gabe


2 Answers

$ sed 's/.*ZZ\([[:upper:]]*[0-9]*\).*/\1/' file
12345678
87654321
AA123456
BB654321
AA123456
BB654321
BB654321

or with GNU awk for the 3rd arg to match():

$ awk 'match($0,/ZZ([[:upper:]]*[0-9]*)/,a){print a[1]}' file
12345678
87654321
AA123456
BB654321
AA123456
BB654321
BB654321

or also GNU awk for gensub():

$ awk '{print gensub(/.*ZZ([[:upper:]]*[0-9]*).*/,"\\1",1)}' file
12345678
87654321
AA123456
BB654321
AA123456
BB654321
BB654321
like image 146
Ed Morton Avatar answered Nov 10 '25 07:11

Ed Morton


try:

awk '{match($0,/[a-zA-Z]+[0-9]+/);print substr($0,RSTART+2,RLENGTH-2);}' Input_file

Using awk's match function which is looking for regex from alphabets to till digits and then printing it's substring which starts from RSTART+2 and till the length of RLENGTH-2.

like image 23
RavinderSingh13 Avatar answered Nov 10 '25 09:11

RavinderSingh13



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!