Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to format a string date (with text and milliseconds) using AWK

Tags:

bash

shell

unix

awk

I'm working on an AWK script that parses millions of lines of text. Each line contains (among other things) a date & time on the form:

16-FEB-2008 14:17:59.994669

I need to convert this into the following form

20080216141759994669000

And I would like avoid translating the month from text into a numerical value manually if it's possible. In bash I can simply do the following command to get the desired result:

date -d "16-FEB-2008 14:17:59.994669" +"%Y%m%d%H%M%S%N"

I have tried invoking this command into AWK but I cannot figure out howto. I would like to know

  1. Is it possible to achieve with AWK alone?
  2. How can I use such a command in an AWK script file?

Thanks in advance

like image 556
DoubleTrouble Avatar asked Dec 07 '22 19:12

DoubleTrouble


2 Answers

Converting month names to numbers in awk is easy, and so is the reformatting as long as you don't need the (additional) validation date does 'for free':

$ echo this 16-FEB-2008 14:17:59.994669 that \
> | awk '{ split($2,d,"-"); split($3,t,"[:.]"); 
    m=sprintf("%02d",index("JAN FEB MAR APR MAY JUN JUL AUG SEP OCT NOV DEC",d[2])/4+1);
    print $1,d[3] m d[1] t[1] t[2] t[3] t[4] "000",$4 }'
this 20080216141759994669000 that
$ # or can put the script in a file and use with awk -f
$ # or the whole thing in a shebang file like #!/bin/awk -f

This is not much longer than the code to run date and much more efficient for 'millions of lines'.

like image 190
dave_thompson_085 Avatar answered Feb 16 '23 01:02

dave_thompson_085


You can call an external command like this:

awk '{
         cmd="date -d \""$0"\" +%Y%m%d%H%M%S%N"
         cmd | getline ts
         print $0, ts
         # awk opened a pipe for the communication with 
         # the command. close that pipe to avoid running
         # out of file descriptors
         close(cmd)
     }' <<< '16-FEB-2008 14:17:59.994669'

Output:

16-FEB-2008 14:17:59.994669 20080216141759994669000

Thanks to dave_thompson_085's comment you can significantly improve the performance if you have date from GNU coreutils and gawk. GNU's date supports reading dates from stdin and gawk supports co-processes which allows to start a single instance of date in the background, write into it's stdin and read from stdout:

{
    cmd = "stdbuf -oL date -f /dev/stdin +%Y%m%d%H%M%S%N"
    print $0 |& cmd 
    cmd |& getline ts
    print $0, ts
}

Note that you need to use the stdbuf command in addition to force date to output the results line by line.

like image 37
hek2mgl Avatar answered Feb 15 '23 23:02

hek2mgl