Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

system() yields inconsistent results

Tags:

date

shell

awk

I am using awk to add a column to a CSV file. Each line of the file starts with a date & time. I want the new column to be that time as a Unix epoch timestamp.

I'm feeding the first three fields to 'date' which does give me the timestamp, but there are some stray commas "," that I don't want. Below is a simplified version printing just that one timestamp column as output. Input file:

08/17/2020 21:46:04 -700 , 1 , 2 , 3
08/17/2020 21:47:03 -700 , 1 , 2 , 3
08/17/2020 21:48:03 -700 , 1 , 2 
08/17/2020 21:49:04 -700 , 1 , 2 

Here is my program:

cat input.csv | awk '{
       tmp=$(system("date +%s -d \""$1" "$2" "$3"\""));
       printf("%s", $tmp );
       }'

and here is the output. This is what I expect, except for the leading commas on lines 2 and 3. I think the 'tmp' variable gets the result from 'date' including a trailing newline, but also sometimes with a comma after the newline. It seems to depend on how many fields in the input file follow the first three, but I only reference the first three, as $1 $2 $3 so the rest of the input line plays no role (?) Why do those comma show up in the output, and how could I remove them? Is there a better way to do this?

1597725964
,1597726023
,1597726083
1597726144
like image 678
John Beale Avatar asked Aug 18 '20 05:08

John Beale


2 Answers

Using shell

Try:

$ while read -r line; do date +%s -d "${line%%,*}"; done < input.csv
1597725964
1597726023
1597726083
1597726144

How it works

  1. while read -r line; do starts a while loop and reads a line from stdin.

  2. "${line%%,*}" strips the commas and everything after them from the line.

  3. date +%s -d "${line%%,*}" prints the date as epoch.

  4. done completes the while loop.

  5. <input.csv provides the stdin to the loop.

Variation

This prints the full line and adds the epoch as the final column:

$ while read line; do printf "%s, %s\n" "$line" $(date +%s -d "${line%%,*}"); done < input.csv
08/17/2020 21:46:04 -700 , 1 , 2 , 3, 1597725964
08/17/2020 21:47:03 -700 , 1 , 2 , 3, 1597726023
08/17/2020 21:48:03 -700 , 1 , 2, 1597726083
08/17/2020 21:49:04 -700 , 1 , 2, 1597726144
like image 114
John1024 Avatar answered Oct 20 '22 13:10

John1024


In awk you can use a coprocess with getline instead of system():

< input.csv awk -F' , ' '{
    "date +%s -d \047"$1"\047\n" | getline date
    print date
}'
1597725964
1597726023
1597726083
1597726144

With the help of Inian and oguz ismail in comments, and gawk, we came up with a better solution, which writes into date's stdin, instead of passing the arguments via command line to it. That's better because interpolating variables into a command line always comes with the risk of shell command injection (via input.csv).

< input.csv gawk -F' , ' '{
    cmd = "date +%s -f-";
    print $1 |& cmd;
    close(cmd, "to");
    if ((cmd |& getline line) > 0)
        print line; close(cmd)
}'
1597725964
1597726023
1597726083
1597726144

Thanks to both!

like image 34
hek2mgl Avatar answered Oct 20 '22 11:10

hek2mgl