Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Remove duplicate lines based on a partial line comparison

Tags:

vim

I have a text file that contains thousands of lines of text as below.

123 hello world
124 foo bar
125 hello world

I would like to test for duplicates by checking a sub-section of the line. For the above it should output:

123 hello world
124 foo bar

Is there a vim command that can do this?

Update: I am on a windows machine so can't use uniq

like image 585
Bruno Avatar asked Nov 06 '12 15:11

Bruno


1 Answers

This is a bash command:

sort -k2 input | uniq -s4
  • sort -k2 will skip the 1st field when sorting
  • uniq -s4 will skip the leading 4 characters

In vim, you can call external command above:

:%!sort -k2 % | uniq -s4
  • the 2nd % will expand to current file name.

Actually, you can sort in vim with this command:

:sort /^\d*\s/
  • vim will skip the matched numbers when sorting

After sorting, use this command to remove duplicated lines:

:%s/\v(^\d*\s(.*)$\n)(^\d*\s\2$\n)+/\1/
  • To avoid too many backslash escaping, I use \v in the pattern to turn on VERY MAGIC.
  • In a multi-line pattern, $ will match position right before newline(\n). I don't think it's necessary here, though.
  • You can craft your own regex.
like image 175
kev Avatar answered Nov 28 '22 07:11

kev