Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Batch to remove duplicate rows from text file

Is it possible to remove duplicate rows from a text file? If yes, how?

like image 471
Rocshy Avatar asked Jul 27 '12 14:07

Rocshy


2 Answers

Some time ago I found an unexpectly simple solution, but this unfortunately only works on Windows 10: the sort command features some undocumented options that can be adopted:

  • /UNIQ[UE] to output only unique lines;
  • /C[ASE_SENSITIVE] to sort case-sensitively;

So use the following line of code to remove duplicate lines (remove /C to do that in a case-insensitive manner):

sort /C /UNIQUE "incoming.txt" /O "outgoing.txt"

This removes duplicate lines from the text in incoming.txt and provides the result in outgoing.txt. Regard that the original order is of course not going to be preserved (because, well, this is the main purpose of sort).

However, you sould use these options with care as there might be some (un)known issues with them, because there is possibly a good reason for them not to be documented (so far).

like image 60
aschipfl Avatar answered Sep 20 '22 07:09

aschipfl


The Batch file below do what you want:

@echo off
setlocal EnableDelayedExpansion
set "prevLine="
for /F "delims=" %%a in (theFile.txt) do (
   if "%%a" neq "!prevLine!" (
      echo %%a
      set "prevLine=%%a"
   )
)

If you need a more efficient method, try this Batch-JScript hybrid script that is developed as a filter, that is, similar to Unix uniq program. Save it with .bat extension, like uniq.bat:

@if (@CodeSection == @Batch) @then

@CScript //nologo //E:JScript "%~F0" & goto :EOF

@end

var line, prevLine = "";
while ( ! WScript.Stdin.AtEndOfStream ) {
   line = WScript.Stdin.ReadLine();
   if ( line != prevLine ) {
      WScript.Stdout.WriteLine(line);
      prevLine = line;
   }
}

Both programs were copied from this post.

like image 37
Aacini Avatar answered Sep 17 '22 07:09

Aacini