Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Bash - compare 2 text files and find missing lines

I've got 2 log files generated by a traffic generator. The format of the logs is:

[packet ID][Hour Tx][Min Tx][Sec Tx][Hour Rx][Min Rx][Sec Rx][packet size][flow number]

The first file is the sender log:

  1 13 15 17.799915 13 15 17.799915 512 1
  2 13 15 17.800016 13 15 17.800016 512 1
  3 13 15 17.800034 13 15 17.800034 512 1
  4 13 15 17.800050 13 15 17.800050 512 1
  5 13 15 17.800081 13 15 17.800081 512 1
  6 13 15 17.800094 13 15 17.800094 512 1
  7 13 15 17.800117 13 15 17.800117 512 1
  8 13 15 17.800126 13 15 17.800126 512 1
  9 13 15 17.800135 13 15 17.800135 512 1
 10 13 15 17.800157 13 15 17.800157 512 1
 11 13 15 17.800166 13 15 17.800166 512 1
 12 13 15 17.800173 13 15 17.800173 512 1
 13 13 15 17.800181 13 15 17.800181 512 1
 14 13 15 17.800202 13 15 17.800202 512 1
 15 13 15 17.800212 13 15 17.800212 512 1
 16 13 15 17.800220 13 15 17.800220 512 1
 17 13 15 17.800228 13 15 17.800228 512 1
 18 13 15 17.800257 13 15 17.800257 512 1
 19 13 15 17.800266 13 15 17.800266 512 1
 20 13 15 17.800274 13 15 17.800274 512 1
 21 13 15 17.800297 13 15 17.800297 512 1
 22 13 15 17.800305 13 15 17.800305 512 1
 23 13 15 17.800313 13 15 17.800313 512 1
 24 13 15 17.800321 13 15 17.800321 512 1
 25 13 15 17.800343 13 15 17.800343 512 1
 26 13 15 17.800351 13 15 17.800351 512 1
 27 13 15 17.800359 13 15 17.800359 512 1
 28 13 15 17.800367 13 15 17.800367 512 1
 29 13 15 17.800387 13 15 17.800387 512 1
 30 13 15 17.800397 13 15 17.800397 512 1
 31 13 15 17.800404 13 15 17.800404 512 1
 32 13 15 17.800414 13 15 17.800414 512 1
 33 13 15 17.800436 13 15 17.800436 512 1
 34 13 15 17.800444 13 15 17.800444 512 1
 35 13 15 17.800452 13 15 17.800452 512 1
 36 13 15 17.800460 13 15 17.800460 512 1
 37 13 15 17.800483 13 15 17.800483 512 1
 38 13 15 17.800491 13 15 17.800491 512 1
 39 13 15 17.800499 13 15 17.800499 512 1
 40 13 15 17.800507 13 15 17.800507 512 1

and it continues for several thousands lines.

The second file is the receiver file:

  1 13 15 17.799915 13 15 17.800965 512 1
  3 13 15 17.800034 13 15 17.801605 512 1
  5 13 15 17.800081 13 15 17.802808 512 1
  7 13 15 17.800117 13 15 17.811653 512 1
  8 13 15 17.800126 13 15 17.811686 512 1
  9 13 15 17.800135 13 15 17.811992 512 1
 11 13 15 17.800166 13 15 17.812425 512 1
 13 13 15 17.800181 13 15 17.812966 512 1
 15 13 15 17.800212 13 15 17.814371 512 1
 17 13 15 17.800228 13 15 17.814813 512 1
 19 13 15 17.800266 13 15 17.815244 512 1
 21 13 15 17.800297 13 15 17.815804 512 1
 23 13 15 17.800313 13 15 17.816314 512 1
 25 13 15 17.800343 13 15 17.816805 512 1
 27 13 15 17.800359 13 15 17.817385 512 1
 29 13 15 17.800387 13 15 17.817930 512 1
 31 13 15 17.800404 13 15 17.819176 512 1
 33 13 15 17.800436 13 15 17.819654 512 1
 35 13 15 17.800452 13 15 17.820115 512 1
 37 13 15 17.800483 13 15 17.820649 512 1
 39 13 15 17.800499 13 15 17.821185 512 1
 41 13 15 17.800528 13 15 17.821781 512 1
 43 13 15 17.800545 13 15 17.822329 512 1
 45 13 15 17.800573 13 15 17.822976 512 1
 47 13 15 17.800590 13 15 17.824001 512 1
 49 13 15 17.800619 13 15 17.824448 512 1
 51 13 15 17.800738 13 15 17.824963 512 1
 53 13 15 17.800772 13 15 17.828931 512 1
 55 13 15 17.800788 13 15 17.829416 512 1
 57 13 15 17.801005 13 15 17.829820 512 1
 59 13 15 17.801035 13 15 17.830404 512 1
 61 13 15 17.801053 13 15 17.830873 512 1
 63 13 15 17.801088 13 15 17.831448 512 1
 65 13 15 17.801106 13 15 17.832285 512 1
 67 13 15 17.801225 13 15 17.832860 512 1
 69 13 15 17.801243 13 15 17.833318 512 1
 71 13 15 17.801274 13 15 17.833921 512 1
 73 13 15 17.801290 13 15 17.834448 512 1
 75 13 15 17.801321 13 15 17.834983 512 1
 77 13 15 17.801339 13 15 17.835492 512 1

and it continues for several thousands lines.

The first column of the second file is not necessarily ordered. As you've probably seen, lines of the 2 files starting with the same ID are not equal (the timestamps are different).

I'd like to isolate those packets (lines) that are in the first file but that are missing in the second file. That is I'd like to know the timestamps of packets that have been sent but not received. The primary key of those files is the first column (ID of the packets sent). The problem is that I tried with sort and join but I couldn't be able to get the results that I wanted.

Thank you

like image 532
condorwasabi Avatar asked Feb 07 '26 10:02

condorwasabi


1 Answers

You can use this awk script for that:

awk 'FNR==NR{a[$1]=$0;next} !($1 in a) {print $1, $4}' file2 file1
like image 98
anubhava Avatar answered Feb 09 '26 09:02

anubhava



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!