i have a data set tab seperated like this:
A B C D
1 aaa 1 2
1 aaa 3 4
1 aaa 5 6
1 bbb 7 8
1 ccc 9 1
1 ccc 2 3
1 ddd 4 5
1 ddd 6 7
1 ddd 8 9
1 ddd 1 2
Desired output:
A B C D
1 aaa 1 2
1 aaa 3 4
1 aaa 5 6
1 ddd 4 5
1 ddd 6 7
1 ddd 8 9
1 ddd 1 2
I have tried this:
awk '++a[$2]>3' test.tsv test.tsv > test-2.tsv
Unwanted output:
1 ddd 1 2
1 aaa 1 2
1 aaa 3 4
1 aaa 5 6
1 ccc 2 3
1 ddd 4 5
1 ddd 6 7
1 ddd 8 9
1 ddd 1 2
You may try this 2 pass awk:
awk -F '\t' 'FNR==NR {freq[$2]++; next} freq[$2] >= 3' test.tsv{,}
1 aaa 1 2
1 aaa 3 4
1 aaa 5 6
1 ddd 4 5
1 ddd 6 7
1 ddd 8 9
1 ddd 1 2
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With