Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to delete double lines in bash

Tags:

linux

bash

line

Given a long text file like this one (that we will call file.txt):

EDITED

1 AA
2 ab
3 azd
4 ab
5 AA
6 aslmdkfj
7 AA

How to delete the lines that appear at least twice in the same file in bash? What I mean is that I want to have this result:

1 AA
2 ab
3 azd
6 aslmdkfj

I do not want to have the same lines in double, given a specific text file. Could you show me the command please?

like image 439
user1619114 Avatar asked Nov 16 '25 20:11

user1619114


1 Answers

Assuming whitespace is significant, the typical solution is:

awk '!x[$0]++' file.txt

(eg, The line "ab " is not considered the same as "ab". It is probably simplest to pre-process the data if you want to treat whitespace differently.)

--EDIT-- Given the modified question, which I'll interpret as only wanting to check uniqueness after a given column, try something like:

awk '!x[ substr( $0, 2 )]++' file.txt

This will only compare columns 2 through the end of the line, ignoring the first column. This is a typical awk idiom: we are simply building an array named x (one letter variable names are a terrible idea in a script, but are reasonable for a one-liner on the command line) which holds the number of times a given string is seen. The first time it is seen, it is printed. In the first case, we are using the entire input line contained in $0. In the second case we are only using the substring consisting of everything including and after the 2nd character.

like image 175
William Pursell Avatar answered Nov 19 '25 09:11

William Pursell



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!