Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Bash: Removing duplicate lines with unique timestamps

Tags:

bash

awk

Maybe you guys can help create a custom awk function for this application, as I have little experience with bash.

I have log.txt with millisecond timestamps and transaction ids (separated by a dash). I want to remove the earliest entry of the same id, if there is one.

1396464155-640de058bac28a44b9fde9a6bbd4b5385588934a38ff543c004ecb94d47dc707
1396464330-640de058bac28a44b9fde9a6bbd4b5385588934a38ff543c004ecb94d47dc707
1396464330-a0ae54a927d49e53f66a511e065a3cc99a35ae2eac215f01d99ea9cc59447185

To this

1396464330-640de058bac28a44b9fde9a6bbd4b5385588934a38ff543c004ecb94d47dc707
1396464330-a0ae54a927d49e53f66a511e065a3cc99a35ae2eac215f01d99ea9cc59447185

Here is the script I'm using for logging. The solution should be handled here, possibly by checking if there is a duplicate transaction id before adding a new entry.

#!/bin/bash
F=./log.txt
D=`date +%s`
echo ${D}-${1} >> ${F}
like image 454
Liam Hogan Avatar asked Jan 26 '26 13:01

Liam Hogan


1 Answers

This awk should do:

awk -F- '{a[$2]=$1} END {for (i in a) print a[i] FS i}' file
1396464330-a0ae54a927d49e53f66a511e065a3cc99a35ae2eac215f01d99ea9cc59447185
1396464330-640de058bac28a44b9fde9a6bbd4b5385588934a38ff543c004ecb94d47dc707

It saves time stamp into an array a using id as index. If there are more than one time stamp for id, it only uses the latest one.

Test file

12-green
12-red
13-green
14-blue
15-orange
15-red
16-orange

awk -F- '{a[$2]=$1} END {for (i in a) print a[i] FS i}' file
16-orange
15-red
14-blue
13-green

To get output sorted

awk -F- '{a[$2]=$1} END {for (i in a) print a[i] FS i | "sort -nt-"}' file
13-green
14-blue
15-red
16-orange
like image 85
Jotne Avatar answered Jan 29 '26 07:01

Jotne