I'm trying to use VIM to remove a duplicate line in an XML file I created. (I can't recreate the file because the ID numbers will change.)
The file looks something like this:
<tag k="natural" v="water"/>
<tag k="nhd:fcode" v="39004"/>
<tag k="natural" v="water"/>
I'm trying to remove one of the duplicate k="natural" v="water" lines. When I try to use the \_
modifier to include newlines in my regex replaces, VIM doesn't seem to find anything.
Any tips on what regex or tool to use?
Remove duplicate lines with uniq If you don't need to preserve the order of the lines in the file, using the sort and uniq commands will do what you need in a very straightforward way. The sort command sorts the lines in alphanumeric order. The uniq command ensures that sequential identical lines are reduced to one.
:%! sort | uniq -u will do just that: sort, remove all lines that are not unique, and leave the result in the file.
It is done by using a Tally number table. The logic is to split the characters into different rows and select minimum value for each value so that duplicates will be removed and concatenate them. Now execute the above procedure by passing the string value. The result is abc12.
First of all, you can use awk
to remove all duplicate lines, keeping their order.
:%!awk '\!_[$0]++'
If you not sure if there are some other duplicate lines you don't want remove, then just add conditions.
:%!awk '\!(_[$0]++ && /tag/ && /natural/ && /water/)'
But, parsing a nested structure like xml with regex is a bad idea, IMHO.
You are going to care them not to be screwed up all the time.
xmllint
gives you a list of specific elements in the file:
:!echo "cat //tag[@k='natural' and @v='water']" | xmllint --shell %
You can slash duplicate lines step by step.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With