Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

awk command to filter three times or more data filter

Tags:

awk

i have a data set tab seperated like this:

A  B  C  D
1  aaa 1 2
1  aaa 3 4
1  aaa 5 6
1  bbb 7 8
1  ccc 9 1
1  ccc 2 3
1  ddd 4 5
1  ddd 6 7
1  ddd 8 9
1  ddd 1 2

Desired output:

A  B  C  D
1  aaa 1 2
1  aaa 3 4
1  aaa 5 6
1  ddd 4 5
1  ddd 6 7
1  ddd 8 9
1  ddd 1 2

I have tried this:

awk '++a[$2]>3' test.tsv test.tsv > test-2.tsv

Unwanted output:

1   ddd 1   2
1   aaa 1   2
1   aaa 3   4
1   aaa 5   6
1   ccc 2   3
1   ddd 4   5
1   ddd 6   7
1   ddd 8   9
1   ddd 1   2
like image 717
ersan Avatar asked May 07 '26 03:05

ersan


1 Answers

You may try this 2 pass awk:

awk -F '\t' 'FNR==NR {freq[$2]++; next} freq[$2] >= 3' test.tsv{,}

1  aaa 1 2
1  aaa 3 4
1  aaa 5 6
1  ddd 4 5
1  ddd 6 7
1  ddd 8 9
1  ddd 1 2
like image 162
anubhava Avatar answered May 11 '26 15:05

anubhava