Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Awk: using a file to filter another one (out.tr)

Tags:

awk

Help with awk, using a file to filter another one I have a main file:

...
17,466971 0,095185 17,562156 id 676
17,466971 0,096694 17,563665 id 677
17,466971 0,09816 17,565131 id 678
17,466971 0,099625 17,566596 id 679
17,466971 0,101091 17,568062 id 680
17,466971 0,016175 17,483146 id 681
17,466971 0,101793 17,568764 id 682
17,466971 0,10253 17,569501 id 683
38,166772 0,08125 38,248022 id 1572
38,166772 0,082545 38,249317 id 1573
38,233772 0,005457 38,239229 id 1574
38,233772 0,082113 38,315885 id 1575
38,299771 0,081412 38,381183 id 1576
38,299771 0,006282 38,306053 id 1577
38,299771 0,083627 38,383398 id 1578
38,299771 0,085093 38,384864 id 1579
38,299771 0,008682 38,308453 id 1580
38,299771 0,085094 38,384865 id 1581
...

I want to suppress/delete some lines based on this other file, last column (id) :

...
d 17.483146 1 0 udp 181 ------- 1 19.0 2.0 681
d 38.239229 1 0 udp 571 ------- 1 19.0 2.0 1574
d 38.306053 1 0 udp 1000 ------- 1 19.0 2.0 1577
d 38.308453 1 0 udp 1000 ------- 1 19.0 2.0 1580
d 38.372207 1 0 udp 546 ------- 1 19.0 2.0 1582
d 38.441845 1 0 udp 499 ------- 1 19.0 2.0 1585
d 38.505262 1 0 udp 616 ------- 1 19.0 2.0 1586
d 38.572324 1 0 udp 695 ------- 1 19.0 2.0 1588
d 38.639246 1 0 udp 597 ------- 1 19.0 2.0 1590
d 38.639758 1 0 udp 640 ------- 1 19.0 2.0 1591 
...

For the example above, the result would be:

17,466971 0,095185 17,562156 id 676
17,466971 0,096694 17,563665 id 677
17,466971 0,09816 17,565131 id 678
17,466971 0,099625 17,566596 id 679
17,466971 0,016175 17,483146 id 680
17,466971 0,101793 17,568764 id 682
17,466971 0,10253 17,569501 id 683
38,166772 0,08125 38,248022 id 1572
38,166772 0,082545 38,249317 id 1573
38,233772 0,082113 38,315885 id 1575
38,299771 0,081412 38,381183 id 1576
38,299771 0,083627 38,383398 id 1578
38,299771 0,085093 38,384864 id 1579
38,299771 0,085094 38,384865 id 1581

The lines deletes were:

17,466971 0,101091 17,568062 id 681
38,233772 0,005457 38,239229 id 1574
38,299771 0,006282 38,306053 id 1577
38,299771 0,008682 38,308453 id 1580

Is there a command using awk to make this automatic?

Thank you in advance

like image 588
roberutsu Avatar asked Dec 27 '12 22:12

roberutsu


1 Answers

Here's one way using awk:

awk 'FNR==NR { a[$NF]; next } !($NF in a)' other main

Results:

17,466971 0,095185 17,562156 id 676
17,466971 0,096694 17,563665 id 677
17,466971 0,09816 17,565131 id 678
17,466971 0,099625 17,566596 id 679
17,466971 0,101091 17,568062 id 680
17,466971 0,101793 17,568764 id 682
17,466971 0,10253 17,569501 id 683
38,166772 0,08125 38,248022 id 1572
38,166772 0,082545 38,249317 id 1573
38,233772 0,082113 38,315885 id 1575
38,299771 0,081412 38,381183 id 1576
38,299771 0,083627 38,383398 id 1578
38,299771 0,085093 38,384864 id 1579
38,299771 0,085094 38,384865 id 1581

Drop the exclamation mark to show the 'deleted' lines:

awk 'FNR==NR { a[$NF]; next } $NF in a' other main

Results:

17,466971 0,016175 17,483146 id 681
38,233772 0,005457 38,239229 id 1574
38,299771 0,006282 38,306053 id 1577
38,299771 0,008682 38,308453 id 1580

Alternatively, if you'd like two files, one containing values 'present' and the other containing values 'deleted', try:

awk 'FNR==NR { a[$NF]; next } { print > ($NF in a ? "deleted" : "present") }' other main

Explanation1:

FNR==NR { ... } is a commonly used construct that returns true for only the first file in the arguments list. In this case, awk will read the file 'other' first. When this file is being processed, the value in the last column ($NF) is added to an array (which we have called a). next then skips processing the rest of our code. Once the first file has been read, FNR will no longer be equal to NR, thus awk will be 'allowed' to skip the FNR--NR { ... } block and begin processing the remainder of the code which is applied to the second file in the arguments list, 'main'. For example, !($NF in a), will not print the line if $NF is not in the array.

Explanation2:

With regards to which column, you may find this helpful:

$1         # the first column
$2         # the second column
$3         # the third column

$NF        # the last column
$(NF-1)    # the second last column
$(NF-2)    # the third last column
like image 140
Steve Avatar answered Sep 20 '22 23:09

Steve