I am aiming to identify and keep DUPLICATE, TRIPLICATE, etc. lines, i.e., all lines that occur more than once in Notepad++? In other words, how can I delete all unique lines only? For example, here are seven (7) separate lists and the desired true duplicate lines of each lists (shown as 7 columns, regard each column as an individual list or file!). (The lists here are shown side by side only to save space, in real life, each of the 7 lists occurs alone and independently from the others and are separate files!) <pre class="prettyprint"><code>list1 list2 list3 list4 list5 list6 list7 1 0 0 0 0 0 0 2 1 1 1 1 1 1 3 2 2 2 2 2 2 4 3 3 3 3 3 3 4 4 4 4 4 4 4 4 4 4 4 4 4 4 5 4 4 4 4 4 4 6 5 5 5 5 5 5 7 5 5 5 5 5 5 8 6 6 6 6 6 6 9 6 6 6 6 6 6 abc 7 7 7 7 7 7 abd 8 8 8 8 8 8 abd 9 9 9 9 9 9 abe <CR> 9 9 9 9 <CR> 99 99 <CR> [Lines of multiple occurence of above lists:] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 abd 5 5 5 5 5 5 abd 5 5 5 5 5 5 6 6 6 6 6 6 6 6 6 6 6 6 9 9 9 9 9 9 9 9 </code></pre> There are many solutions to eliminate duplicates (e.g., TextFX; notepad++ delete duplicate and original lines to keep unique lines), I can not find solutions to keep duplicates only. <code>((.*)\R(\2\R)+)*\K.+\R</code> @Lars Fischer: This script works nearly OK, except the last entry of the (presorted) list needs to be unique line followed by a <code><CR></code> empty line. One (suboptimal) workaround is to insert an artificial (helper) unique line (e.g., zzz) followed by an empty line <code><CR></code> as the last two lines. (END OF QUESTION) <hr> UPDATE 3: This question is reposted per stackoverflow "ask a new question" instruction. (@AdrianHHH, @B. Desai, @Paolo Forgia, @greg-449, @Erik von Asmuth draw the incorrect conclusion that this question is a duplicate of notepad++ delete duplicate and original lines to keep unique lines. This question is definitely not a duplicate of the one @AdrianHHH et al quotes. <img src="https://i.stack.imgur.com/Zus3O.jpg" alt="History."> UPDATE 2: @AdrianHHH This question is not less "broad" (in fact, one can hardly be more specific) or less researched than other Notepad++ questions, including the one https://stackoverflow.com/questions/29303148 cited (wrongly) by @AdrianHHH et al. as the same question. UPDATE: @AdrianHHH, @B. Desai, @Paolo Forgia, @greg-449, @Erik von Asmuth This questions is different from: https://stackoverflow.com/questions/29303148 beacuse Q 29303148 is (i) neither asking how to identify and keep only the lines of multiple occurrence, (ii) neither there is a solution provided in the answers for that. Q 29303148 asks "...I just need the unique lines."

Here is a solution based on regular Expressions and bookmarks, it works for a sorted file (i.e. each duplicated line is followed by its duplicates): <ul> <li>Open the Mark Dialog (Search -> Mark ....)</li> <li>click Clear all Marks on the right</li> <li>check Bookmark line </li> <li>check Wrap aound </li> <li> Find What: <code>((.*)\R(\2\R?)+)*\K.*</code> </li> <li>Check regular expression and uncheck <code>. matches newline</code> </li> <li>Mark All</li> <li>Click Close </li> <li>Search -> Bookmark -> Remove Bookmarked Lines</li> </ul> Explanation The regular expression is made up of three parts: <ul> <li> <code>((.*)\R(\2\R?)+)*</code> : this is an optional block of duplicates consisting of one ore more line blocks <ul> <li>the outher <code>( ... )*</code> matches zero or more such blocks of duplicated lines (if in your example the three 4 would be followed by two 5 we will need a concept of sequences of duplicate blocks) </li> <li> <code>(.*)\R(\2\R?)+</code>: <code>\2</code> references the content of <code>(.*)</code>: this are all duplicates of one line</li> <li>the second <code>\R</code> is an optional ( due to the <code>?</code>) linebreak. Thus it is possible to match a duplicate in the last line of the file if that line does not end with a linebreak </li> </ul> If there is a block of duplicated lines after the cursor position from which you start, this will match it. </li> <li>now <code>\K</code> discards what we have matched so far (the duplicates) and "puts the cursor" before the first unique line</li> <li> <code>.*</code> matches the next (unique) line and bookmarks it</li> </ul> Using Mark All we bookmark all such unique lines, so that we can remove them using the Entry from the Search -> Bookmark menu.

Find and KEEP all DUPLICATE lines (instead of unique lines) in a text file

Tags:

text

list

sorting

duplicates

notepad++

I am aiming to identify and keep DUPLICATE, TRIPLICATE, etc. lines, i.e., all lines that occur more than once in Notepad++? In other words, how can I delete all unique lines only?

For example, here are seven (7) separate lists and the desired true duplicate lines of each lists (shown as 7 columns, regard each column as an individual list or file!). (The lists here are shown side by side only to save space, in real life, each of the 7 lists occurs alone and independently from the others and are separate files!)

list1  list2  list3  list4  list5  list6  list7
1      0      0      0      0      0      0
2      1      1      1      1      1      1
3      2      2      2      2      2      2
4      3      3      3      3      3      3
4      4      4      4      4      4      4
4      4      4      4      4      4      4
5      4      4      4      4      4      4
6      5      5      5      5      5      5
7      5      5      5      5      5      5
8      6      6      6      6      6      6
9      6      6      6      6      6      6
abc    7      7      7      7      7      7
abd    8      8      8      8      8      8
abd    9      9      9      9      9      9
abe           <CR>   9      9      9      9
                            <CR>   99     99
                                          <CR>

[Lines of multiple occurence of above lists:]         
4      4      4      4      4      4      4
4      4      4      4      4      4      4
4      4      4      4      4      4      4
abd    5      5      5      5      5      5
abd    5      5      5      5      5      5
       6      6      6      6      6      6
       6      6      6      6      6      6
                     9      9      9      9
                     9      9      9      9

There are many solutions to eliminate duplicates (e.g., TextFX; notepad++ delete duplicate and original lines to keep unique lines), I can not find solutions to keep duplicates only.

((.*)\R(\2\R)+)*\K.+\R @Lars Fischer: This script works nearly OK, except the last entry of the (presorted) list needs to be unique line followed by a <CR> empty line. One (suboptimal) workaround is to insert an artificial (helper) unique line (e.g., zzz) followed by an empty line <CR> as the last two lines.

(END OF QUESTION)

UPDATE 3: This question is reposted per stackoverflow "ask a new question" instruction. (@AdrianHHH, @B. Desai, @Paolo Forgia, @greg-449, @Erik von Asmuth draw the incorrect conclusion that this question is a duplicate of notepad++ delete duplicate and original lines to keep unique lines. This question is definitely not a duplicate of the one @AdrianHHH et al quotes. History.

UPDATE 2: @AdrianHHH This question is not less "broad" (in fact, one can hardly be more specific) or less researched than other Notepad++ questions, including the one https://stackoverflow.com/questions/29303148 cited (wrongly) by @AdrianHHH et al. as the same question.

UPDATE: @AdrianHHH, @B. Desai, @Paolo Forgia, @greg-449, @Erik von Asmuth This questions is different from: https://stackoverflow.com/questions/29303148 beacuse Q 29303148 is (i) neither asking how to identify and keep only the lines of multiple occurrence, (ii) neither there is a solution provided in the answers for that. Q 29303148 asks "...I just need the unique lines."

499

asked Oct 13 '17 09:10

user3026965

1 Answers

Here is a solution based on regular Expressions and bookmarks, it works for a sorted file (i.e. each duplicated line is followed by its duplicates):

Open the Mark Dialog (Search -> Mark ....)
click Clear all Marks on the right
check Bookmark line
check Wrap aound
Find What: ((.*)\R(\2\R?)+)*\K.*
Check regular expression and uncheck . matches newline
Mark All
Click Close
Search -> Bookmark -> Remove Bookmarked Lines

Explanation

The regular expression is made up of three parts:

((.*)\R(\2\R?)+)* : this is an optional block of duplicates consisting of one ore more line blocks
- the outher ( ... )* matches zero or more such blocks of duplicated lines (if in your example the three 4 would be followed by two 5 we will need a concept of sequences of duplicate blocks)
- (.*)\R(\2\R?)+: \2 references the content of (.*): this are all duplicates of one line
- the second \R is an optional ( due to the ?) linebreak. Thus it is possible to match a duplicate in the last line of the file if that line does not end with a linebreak
If there is a block of duplicated lines after the cursor position from which you start, this will match it.
now \K discards what we have matched so far (the duplicates) and "puts the cursor" before the first unique line
.* matches the next (unique) line and bookmarks it

Using Mark All we bookmark all such unique lines, so that we can remove them using the Entry from the Search -> Bookmark menu.

124

answered Oct 02 '22 09:10

Lars Fischer

Related questions
                            
                                Is an R object of class "POSIXlt" a "list" or not?
                            
                                Sum of multiple list of lists index wise
                            
                                Efficiently remove last element from std::list
                            
                                How to filter a nested dictionary (pythonic way) for a specific value using map or filter instead of list comprehensions?
                            
                                Looping through a list of pandas dataframes
                            
                                How to return a list from SQL query using pyodbc?
                            
                                Error when trying to write DataFrame to feather. Does feather support list columns?
                            
                                'continue' the 'for' loop to the previous element
                            
                                java - How to find matching objects between two lists?
                            
                                A list class to store enums?
                            
                                Local sequence cannot be used in LINQ to SQL implementations of query operators except the Contains operator
                            
                                Problem Implementing Observer Pattern : "Member reference base type ________ is not a structure or union"
                            
                                List interface: from Java to C#
                            
                                Adding members to C# List using indexes
                            
                                How to count distinct values in a list in linear time?
                            
                                How do I force nested list items to be the same width as parent list item?
                            
                                How to read input file in Python?
                            
                                How does Python's list.remove(value) determine what value to remove?
                            
                                How to find the sum of the lengths of a list in a dictionary of dictionaries?
                            
                                How to remove duplicate dictionary based on selected keys from a list of dictionaries in Python?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With