I have text files with repeated exact lines of text, but I only want one of each. Imagine this text file:
AAAAA
AAAAA
AAAAA
BB
BBBBB
BBBBB
CCC
CCC
CCC
I would only need the following four lines from it:
AAAAA
BB
BBBBB
CCC
I'm using a text editor (EmEditor or Notepad++), that supports RegEx, not a programming language, so I must use a purely Regular Expression.
Any help?
EDIT: I checked the other thread that hsz mentioned and I'd like to make it clear that this one is not the same. Although both need to remove duplicate lines, the way to achieve it is different. I need pure RegEx, but the best answer from the other thread relies on a specific Notepad++ plug-in (which doesn't even come with it any more), so it's not even a regex solution. The second case there, is a regex and it does work on Notepad++, but not on EmEditor at all, which I also need. So I don't think my question is a repetition of that one, although that link is useful, an so I thank hsz for it.
Two nearly identical options:
Match All Lines That Are Not Repeated
(?sm)(^[^\r\n]+$)(?!.*^\1$)
The lines will be matched, but to extract them, you really want to replace the other ones.
Replace All Repeated Lines
This will work better in Notepad++:
Search: (?sm)(^[^\r\n]*)[\r\n](?=.*^\1)
Replace: empty string
(?s)
activates DOTALL
mode, allowing the dot to match across lines(?m)
turns on multi-line mode, allowing ^
and $
to match on each line(^[^\r\n]*)
captures a line to Group 1, i.e.^
anchor asserts that we are at the beginning of the string[^\r\n]*
matches any chars that are not newline chars[\r\n]
matches the newline chars(?!.*^\1$)
asserts that we can match any number of characters .*
, then...^\1$
the same line as Group 1If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With