Is it possible to remove duplicate rows from a text file? If yes, how?
Some time ago I found an unexpectly simple solution, but this unfortunately only works on Windows 10: the sort
command features some undocumented options that can be adopted:
/UNIQ[UE]
to output only unique lines;/C[ASE_SENSITIVE]
to sort case-sensitively;So use the following line of code to remove duplicate lines (remove /C
to do that in a case-insensitive manner):
sort /C /UNIQUE "incoming.txt" /O "outgoing.txt"
This removes duplicate lines from the text in incoming.txt
and provides the result in outgoing.txt
. Regard that the original order is of course not going to be preserved (because, well, this is the main purpose of sort
).
However, you sould use these options with care as there might be some (un)known issues with them, because there is possibly a good reason for them not to be documented (so far).
The Batch file below do what you want:
@echo off
setlocal EnableDelayedExpansion
set "prevLine="
for /F "delims=" %%a in (theFile.txt) do (
if "%%a" neq "!prevLine!" (
echo %%a
set "prevLine=%%a"
)
)
If you need a more efficient method, try this Batch-JScript hybrid script that is developed as a filter, that is, similar to Unix uniq
program. Save it with .bat extension, like uniq.bat
:
@if (@CodeSection == @Batch) @then
@CScript //nologo //E:JScript "%~F0" & goto :EOF
@end
var line, prevLine = "";
while ( ! WScript.Stdin.AtEndOfStream ) {
line = WScript.Stdin.ReadLine();
if ( line != prevLine ) {
WScript.Stdout.WriteLine(line);
prevLine = line;
}
}
Both programs were copied from this post.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With