Given an input file of text lines, I want duplicate lines to be identified and removed. Please show a simple snippet of C# that accomplishes this.
To start your duplicate search, go to File -> Find Duplicates or click the Find Duplicates button on the main toolbar. The Find Duplicates dialog will open, as shown below. The Find Duplicates dialog is intuitive and easy to use. The first step is to specify which folders should be searched for duplicates.
Uniq command is helpful to remove or detect duplicate entries in a file.
For small files:
string[] lines = File.ReadAllLines("filename.txt");
File.WriteAllLines("filename.txt", lines.Distinct().ToArray());
This should do (and will copy with large files).
Note that it only removes duplicate consecutive lines, i.e.
a
b
b
c
b
d
will end up as
a
b
c
b
d
If you want no duplicates anywhere, you'll need to keep a set of lines you've already seen.
using System;
using System.IO;
class DeDuper
{
static void Main(string[] args)
{
if (args.Length != 2)
{
Console.WriteLine("Usage: DeDuper <input file> <output file>");
return;
}
using (TextReader reader = File.OpenText(args[0]))
using (TextWriter writer = File.CreateText(args[1]))
{
string currentLine;
string lastLine = null;
while ((currentLine = reader.ReadLine()) != null)
{
if (currentLine != lastLine)
{
writer.WriteLine(currentLine);
lastLine = currentLine;
}
}
}
}
}
Note that this assumes Encoding.UTF8
, and that you want to use files. It's easy to generalize as a method though:
static void CopyLinesRemovingConsecutiveDupes
(TextReader reader, TextWriter writer)
{
string currentLine;
string lastLine = null;
while ((currentLine = reader.ReadLine()) != null)
{
if (currentLine != lastLine)
{
writer.WriteLine(currentLine);
lastLine = currentLine;
}
}
}
(Note that that doesn't close anything - the caller should do that.)
Here's a version that will remove all duplicates, rather than just consecutive ones:
static void CopyLinesRemovingAllDupes(TextReader reader, TextWriter writer)
{
string currentLine;
HashSet<string> previousLines = new HashSet<string>();
while ((currentLine = reader.ReadLine()) != null)
{
// Add returns true if it was actually added,
// false if it was already there
if (previousLines.Add(currentLine))
{
writer.WriteLine(currentLine);
}
}
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With