Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Re-order columns in a text file by a specific pattern

I'm very new at awk and have been banging my head trying to get this to work. I'm trying to take a list of files in "image.list" and create an "info" file out of it. I need to grab the string matching a regex (a number 8-11 digits long) from the middle of the filename and print just that match into the designated spot in my "info file". That last part is the part I'm having trouble pulling off. Would love some help fixing that.

Here is my test file list:

SURGERY0001275678image1.jpg
SURGERY11134900211image2.jpg
SURGERY19257012image3.jpg
SURGERY273142590image4.jpg

Here is my current code:

awk 'BEGIN {print "-----TEST TAG FILE\tENCOUNTERS-----";}
> {print "FILE:  /tmp/imagetest/"$1,"\t","ENCOUNTER: ",($1~/^[0-9]{8,11}$/);}
> END{print "END REPORT";
> }' image.list > upload.tag

And here is my current output:

-----TEST TAG FILE      ENCOUNTERS-----
FILE:  /tmp/imagetest/SURGERY0001275678image1.jpg        ENCOUNTER:  0
FILE:  /tmp/imagetest/SURGERY11134900211image2.jpg       ENCOUNTER:  0
FILE:  /tmp/imagetest/SURGERY19257012image3.jpg          ENCOUNTER:  0
FILE:  /tmp/imagetest/SURGERY273142590image4.jpg         ENCOUNTER:  0
END REPORT

What i need it to display is the 8-11 digit number in the middle of the file name after "ENCOUNTER:". So far everything I've tried outputs either the whole filename or "0".

I'm probably way off course so I'd love to get some help from you experts!

like image 436
CyberSamurai Avatar asked Dec 08 '25 23:12

CyberSamurai


2 Answers

Re-using your existing code:

$ awk '
BEGIN {
    print "-----TEST TAG FILE\tENCOUNTERS-----";
}
match($0,/[^0-9]+([0-9]+)[^0-9]+/,ary) {
    print "FILE:  /tmp/imagetest/"$1,"\t","ENCOUNTER:"ary[1]
}
END { 
    print "END REPORT";
}' testfile

Test:

$ cat testfile
SURGERY0001275678image1.jpg
SURGERY11134900211image2.jpg
SURGERY19257012image3.jpg
SURGERY273142590image4.jpg

$ awk '
> BEGIN {
>     print "-----TEST TAG FILE\tENCOUNTERS-----";
> }
> match($0,/[^0-9]+([0-9]+)[^0-9]+/,ary) {
>     print "FILE:  /tmp/imagetest/"$1,"\t","ENCOUNTER:"ary[1]
> }
> END { 
>     print "END REPORT";
> }' testfile
-----TEST TAG FILE      ENCOUNTERS-----
FILE:  /tmp/imagetest/SURGERY0001275678image1.jpg        ENCOUNTER:0001275678
FILE:  /tmp/imagetest/SURGERY11134900211image2.jpg       ENCOUNTER:11134900211
FILE:  /tmp/imagetest/SURGERY19257012image3.jpg          ENCOUNTER:19257012
FILE:  /tmp/imagetest/SURGERY273142590image4.jpg         ENCOUNTER:273142590
END REPORT

As Ed Morton suggested in the comments, using array argument to match() this solution is GNU awk only.

like image 93
jaypal singh Avatar answered Dec 11 '25 12:12

jaypal singh


GNU sed

sed -r -e 's#(.*)#FILE:\t/tmp/imagetest/\1#;s/([0-9]*)(i[^i]*)$/\1\2\tENCOUNTER:\1/;1i -----TEST TAG FILE      ENCOUNTERS-----' -e '$aEND REPORT' file
-----TEST TAG FILE      ENCOUNTERS-----
FILE:   /tmp/imagetest/SURGERY0001275678image1.jpg      ENCOUNTER:0001275678
FILE:   /tmp/imagetest/SURGERY11134900211image2.jpg     ENCOUNTER:11134900211
FILE:   /tmp/imagetest/SURGERY19257012image3.jpg        ENCOUNTER:19257012
FILE:   /tmp/imagetest/SURGERY273142590image4.jpg       ENCOUNTER:273142590
END REPORT
like image 27
captcha Avatar answered Dec 11 '25 14:12

captcha