Maybe you guys can help create a custom awk function for this application, as I have little experience with bash.
I have log.txt with millisecond timestamps and transaction ids (separated by a dash). I want to remove the earliest entry of the same id, if there is one.
1396464155-640de058bac28a44b9fde9a6bbd4b5385588934a38ff543c004ecb94d47dc707
1396464330-640de058bac28a44b9fde9a6bbd4b5385588934a38ff543c004ecb94d47dc707
1396464330-a0ae54a927d49e53f66a511e065a3cc99a35ae2eac215f01d99ea9cc59447185
To this
1396464330-640de058bac28a44b9fde9a6bbd4b5385588934a38ff543c004ecb94d47dc707
1396464330-a0ae54a927d49e53f66a511e065a3cc99a35ae2eac215f01d99ea9cc59447185
Here is the script I'm using for logging. The solution should be handled here, possibly by checking if there is a duplicate transaction id before adding a new entry.
#!/bin/bash
F=./log.txt
D=`date +%s`
echo ${D}-${1} >> ${F}
This awk should do:
awk -F- '{a[$2]=$1} END {for (i in a) print a[i] FS i}' file
1396464330-a0ae54a927d49e53f66a511e065a3cc99a35ae2eac215f01d99ea9cc59447185
1396464330-640de058bac28a44b9fde9a6bbd4b5385588934a38ff543c004ecb94d47dc707
It saves time stamp into an array a using id as index. If there are more than one time stamp for id, it only uses the latest one.
Test file
12-green
12-red
13-green
14-blue
15-orange
15-red
16-orange
awk -F- '{a[$2]=$1} END {for (i in a) print a[i] FS i}' file
16-orange
15-red
14-blue
13-green
To get output sorted
awk -F- '{a[$2]=$1} END {for (i in a) print a[i] FS i | "sort -nt-"}' file
13-green
14-blue
15-red
16-orange
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With