I'm working on an AWK script that parses millions of lines of text. Each line contains (among other things) a date & time on the form:
16-FEB-2008 14:17:59.994669
I need to convert this into the following form
20080216141759994669000
And I would like avoid translating the month from text into a numerical value manually if it's possible. In bash I can simply do the following command to get the desired result:
date -d "16-FEB-2008 14:17:59.994669" +"%Y%m%d%H%M%S%N"
I have tried invoking this command into AWK but I cannot figure out howto. I would like to know
Thanks in advance
Converting month names to numbers in awk is easy, and so is the reformatting as long as you don't need the (additional) validation date does 'for free':
$ echo this 16-FEB-2008 14:17:59.994669 that \
> | awk '{ split($2,d,"-"); split($3,t,"[:.]"); 
    m=sprintf("%02d",index("JAN FEB MAR APR MAY JUN JUL AUG SEP OCT NOV DEC",d[2])/4+1);
    print $1,d[3] m d[1] t[1] t[2] t[3] t[4] "000",$4 }'
this 20080216141759994669000 that
$ # or can put the script in a file and use with awk -f
$ # or the whole thing in a shebang file like #!/bin/awk -f
This is not much longer than the code to run date and much more efficient for 'millions of lines'.
You can call an external command like this:
awk '{
         cmd="date -d \""$0"\" +%Y%m%d%H%M%S%N"
         cmd | getline ts
         print $0, ts
         # awk opened a pipe for the communication with 
         # the command. close that pipe to avoid running
         # out of file descriptors
         close(cmd)
     }' <<< '16-FEB-2008 14:17:59.994669'
Output:
16-FEB-2008 14:17:59.994669 20080216141759994669000
Thanks to  dave_thompson_085's comment you can significantly improve the performance if you have date from GNU coreutils and gawk. GNU's date supports reading dates from stdin and gawk supports co-processes which allows to start a single instance of date in the background, write into it's stdin and read from stdout:
{
    cmd = "stdbuf -oL date -f /dev/stdin +%Y%m%d%H%M%S%N"
    print $0 |& cmd 
    cmd |& getline ts
    print $0, ts
}
Note that you need to use the stdbuf command in addition to force date to output the results line by line.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With