Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to delete the rows whose column 2 and column 3 matches with some previous using awk?

Tags:

shell

awk

I have a file with 4 columns:

ifile.txt
3  5  2  2
1  4  2  1
4  5  7  2 
5  5  7  1 
0  0  1  1
3  5  7  3
5  4  2  2

I would like to delete the rows whose column 2 & 3 values are same with some previous. for instance, row 2 & 7 have same values in column 2 & 3. Similarly row 3 & 4 & 6 has same values in column 2 & 3. So I want to keep the 2rd row and delete 7th row. Similarly keep 3rd row and delete 4th and 6th row. my output is:

ofile.txt
3  5  2  2
1  4  2  1
4  5  7  2
0  0  1  1

I tried with this command

awk '{a[NR]=$2""$3} a[NR]!=a[NR-1]{print}' ifile.txt > ofile.txt

But it is not giving my desire output.

like image 422
Kay Avatar asked Jan 06 '23 04:01

Kay


1 Answers

$ awk '!(($2,$3) in a); {a[$2,$3]}' ifile
3  5  2  2
1  4  2  1
4  5  7  2
0  0  1  1

How it works

awk reads the input file one line at a time. Each input line is divided into fields. In this case, the important fields are the second, denoted $2, and the third, denoted $3.

  • !(($2,$3) in a)

    This condition is true if $2,$3 is not a key in associative array a. Since no action is specified, when this condition is true, the default action is performed which is to print the line.

    In more detail, ($2,$3) in a is true when $2,$3 is a key of a. We, however, want the condition to be true in the opposite. Consequently, we apply awk's negation operator, !, to it.

  • a[$2,$3]

    This adds $2,$3 as a key of a.

like image 103
John1024 Avatar answered Apr 22 '23 10:04

John1024