Given a long text file like this one (that we will call file.txt):
EDITED
1 AA
2 ab
3 azd
4 ab
5 AA
6 aslmdkfj
7 AA
How to delete the lines that appear at least twice in the same file in bash? What I mean is that I want to have this result:
1 AA
2 ab
3 azd
6 aslmdkfj
I do not want to have the same lines in double, given a specific text file. Could you show me the command please?
Assuming whitespace is significant, the typical solution is:
awk '!x[$0]++' file.txt
(eg, The line "ab " is not considered the same as "ab". It is probably simplest to pre-process the data if you want to treat whitespace differently.)
--EDIT-- Given the modified question, which I'll interpret as only wanting to check uniqueness after a given column, try something like:
awk '!x[ substr( $0, 2 )]++' file.txt
This will only compare columns 2 through the end of the line, ignoring the first column. This is a typical awk idiom: we are simply building an array named x (one letter variable names are a terrible idea in a script, but are reasonable for a one-liner on the command line) which holds the number of times a given string is seen. The first time it is seen, it is printed. In the first case, we are using the entire input line contained in $0. In the second case we are only using the substring consisting of everything including and after the 2nd character.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With