Remove duplicate lines based on a partial line comparison

Question

I have a text file that contains thousands of lines of text as below.

123 hello world
124 foo bar
125 hello world

I would like to test for duplicates by checking a sub-section of the line. For the above it should output:

123 hello world
124 foo bar

Is there a vim command that can do this?

Update: I am on a windows machine so can't use uniq

kev · Accepted Answer

This is a bash command:

sort -k2 input | uniq -s4

In vim, you can call external command above:

:%!sort -k2 % | uniq -s4

Actually, you can sort in vim with this command:

:sort /^\d*\s/

After sorting, use this command to remove duplicated lines:

:%s/\v(^\d*\s(.*)$
)(^\d*\s\2$
)+/\1/

To avoid too many backslash escaping, I use \v in the pattern to turn on VERY MAGIC.
In a multi-line pattern, $ will match position right before newline(). I don't think it's necessary here, though.
You can craft your own regex.

Donate For Us