I am aiming to identify and keep DUPLICATE, TRIPLICATE, etc. lines, i.e., all lines that occur more than once in Notepad++? In other words, how can I delete all unique lines only?
For example, here are seven (7) separate lists and the desired true duplicate lines of each lists (shown as 7 columns, regard each column as an individual list or file!). (The lists here are shown side by side only to save space, in real life, each of the 7 lists occurs alone and independently from the others and are separate files!)
list1 list2 list3 list4 list5 list6 list7
1 0 0 0 0 0 0
2 1 1 1 1 1 1
3 2 2 2 2 2 2
4 3 3 3 3 3 3
4 4 4 4 4 4 4
4 4 4 4 4 4 4
5 4 4 4 4 4 4
6 5 5 5 5 5 5
7 5 5 5 5 5 5
8 6 6 6 6 6 6
9 6 6 6 6 6 6
abc 7 7 7 7 7 7
abd 8 8 8 8 8 8
abd 9 9 9 9 9 9
abe <CR> 9 9 9 9
<CR> 99 99
<CR>
[Lines of multiple occurence of above lists:]
4 4 4 4 4 4 4
4 4 4 4 4 4 4
4 4 4 4 4 4 4
abd 5 5 5 5 5 5
abd 5 5 5 5 5 5
6 6 6 6 6 6
6 6 6 6 6 6
9 9 9 9
9 9 9 9
There are many solutions to eliminate duplicates (e.g., TextFX; notepad++ delete duplicate and original lines to keep unique lines), I can not find solutions to keep duplicates only.
((.*)\R(\2\R)+)*\K.+\R
@Lars Fischer: This script works nearly OK, except the last entry of the (presorted) list needs to be unique line followed by a <CR>
empty line. One (suboptimal) workaround is to insert an artificial (helper) unique line (e.g., zzz) followed by an empty line <CR>
as the last two lines.
(END OF QUESTION)
UPDATE 3: This question is reposted per stackoverflow "ask a new question" instruction. (@AdrianHHH, @B. Desai, @Paolo Forgia, @greg-449, @Erik von Asmuth draw the incorrect conclusion that this question is a duplicate of notepad++ delete duplicate and original lines to keep unique lines. This question is definitely not a duplicate of the one @AdrianHHH et al quotes.
UPDATE 2: @AdrianHHH This question is not less "broad" (in fact, one can hardly be more specific) or less researched than other Notepad++ questions, including the one https://stackoverflow.com/questions/29303148 cited (wrongly) by @AdrianHHH et al. as the same question.
UPDATE: @AdrianHHH, @B. Desai, @Paolo Forgia, @greg-449, @Erik von Asmuth This questions is different from: https://stackoverflow.com/questions/29303148 beacuse Q 29303148 is (i) neither asking how to identify and keep only the lines of multiple occurrence, (ii) neither there is a solution provided in the answers for that. Q 29303148 asks "...I just need the unique lines."
Remove duplicate lines with uniq If you don't need to preserve the order of the lines in the file, using the sort and uniq commands will do what you need in a very straightforward way. The sort command sorts the lines in alphanumeric order. The uniq command ensures that sequential identical lines are reduced to one.
Here is a solution based on regular Expressions and bookmarks, it works for a sorted file (i.e. each duplicated line is followed by its duplicates):
((.*)\R(\2\R?)+)*\K.*
. matches newline
Explanation
The regular expression is made up of three parts:
((.*)\R(\2\R?)+)*
: this is an optional block of duplicates consisting of one ore more line blocks
( ... )*
matches zero or more such blocks of duplicated lines (if in your example the three 4 would be followed by two 5 we will need a concept of sequences of duplicate blocks) (.*)\R(\2\R?)+
: \2
references the content of (.*)
: this are all duplicates of one line\R
is an optional ( due to the ?
) linebreak. Thus it is possible to match a duplicate in the last line of the file if that line does not end with a linebreak If there is a block of duplicated lines after the cursor position from which you start, this will match it.
now \K
discards what we have matched so far (the duplicates) and "puts the cursor" before the first unique line
.*
matches the next (unique) line and bookmarks itUsing Mark All we bookmark all such unique lines, so that we can remove them using the Entry from the Search -> Bookmark menu.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With