Say I have two log files (input.log
and output.log
) with the following format:
2012-01-16T12:00:00 12345678
The first field is the processing timestamp and the second is a unique ID. I'm trying to find:
input.log
which don't have a corresponding record for that ID in output.log
input.log
which have a record for that ID, but the difference in the timestamps exceeds 5 secondsI have a workaround solution with MySQL, but I'd ideally like to remove the database component and handle it with a shell script.
I have the following, which returns the lines of input.log
with an added column if output.log
contains the ID:
join -a1 -j2 -o 0 1.1 2.1 <(sort -k2,2 input.log) <(sort -k2,2 output.log)
Example output:
10111 2012-01-16T10:00:00 2012-01-16T10:00:04
11562 2012-01-16T11:00:00 2012-01-16T11:00:10
97554 2012-01-16T09:00:00
Main question:
Now that I have this information, how can I go about computing the differences between the 2 timestamps and discarding those over 5 seconds apart? I hit some problems processing the ISO 8601 timestamp with date
(specifically the T
) and assumed there must be a better way.
Edit: GNU coreutils supports ISO 8601 since late 2011, not long after this question was asked. This is likely no longer an issue for anyone. See this answer
Secondary question:
Is there perhaps a way to rework the entire approach, for instance into a single awk
script? My knowledge of processing multiple files and setting up the correct inequalities for the output conditions was the limiting factor here, hence the approach above.
If you have GNU awk
, then you can try something like this -
gawk '
NR==FNR{a[$2]=$1;next}
!($2 in a) {print $2,$1; next}
($2 in a) {
"date +%s -d " $1 | getline var1;
"date +%s -d " a[$2] | getline var2;
var3 = var2 - var1;
if (var3 > 4) print $2, $1, a[$2]
}' output.log input.log
[jaypal:~/Temp] cat input.log
2012-01-16T09:00:00 9
2012-01-16T10:00:00 10
2012-01-16T11:00:00 11
[jaypal:~/Temp] cat output.log
2012-01-16T10:00:04 10
2012-01-16T11:00:10 11
2012-01-16T12:00:00 12
[jaypal:~/Temp] gawk '
NR==FNR{a[$2]=$1;next}
!($2 in a) {print $2,$1; next}
($2 in a) {"date +%s -d " $1 | getline var1; "date +%s -d " a[$2] | getline var2;var3=var2-var1;if (var3>4) print $2,$1,a[$2] }' output.log input.log
9 2012-01-16T09:00:00
11 2012-01-16T11:00:00 2012-01-16T11:00:10
NR==FNR{a[$2]=$1;next}
We start of by storing the first field in your output.log file in an array indexed on second field. We use next
to prevent the other pattern{action}
statements from running. Using NR==FNR
allows us to slurp the output.log file completely.
!($2 in a) {print $2,$1; next}
Once the output.log file is completed. We start with the input.log file. We check if any second field present in input.log file is not present in our array (i.e output.log file). If found we print it. We continue this action until we have printed out all of those fields.
($2 in a) {"date +%s -d " $1 | getline var1; "date +%s -d " a[$2] | getline var2; var3=var2-var1; if (var3 > 4) print $2,$1,a[$2] }
In this we look for fields that are present in both files. When we find those fields, we need to put in our logic to calculate the difference. We use the system command to find the date. Now system command by default prints to STDOUT and we have no control over them. So we pipe the output and capture the output using awk
getline
function and store it in a variable (var1 and var2). Once both dates are stored in a variable we do the difference and store in var3, if var3 is found to be > 4, we print it in the format you desire.
Here's the solution I went with:
cat input.log
2012-01-16T09:00:00 9
2012-01-16T10:00:00 10
2012-01-16T11:00:00 11
cat output.log
2012-01-16T10:00:04 10
2012-01-16T11:00:10 11
2012-01-16T12:00:00 12
sort -k2,2 input.log > input.sort
sort -k2,2 output.log > output.sort
join -a1 -j2 -o 0 1.1 2.1 input.sort output.sort | while read id i o; do
if [ -n "$o" ]; then
ot=$(date +%s -d "${o/T/ }")
it=$(date +%s -d "${i/T/ }")
[[ $it+5 -lt $ot ]] && echo $id $i $o
else echo $id $i
fi
done
11 2012-01-16T11:00:00 2012-01-16T11:00:10
9 2012-01-16T09:00:00
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With