Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

C# - Remove duplicate lines within a text file

Tags:

c#

.net

windows

Could someone demonstrate how a file is checked for duplicate lines, and then any duplicates are removed either overwriting the existing file, or create a new file with the duplicate lines removed

like image 388
Michael Avatar asked Dec 06 '22 21:12

Michael


1 Answers

If you're using .NET4 then you could use a combination of File.ReadLines and File.WriteAllLines:

var previousLines = new HashSet<string>();

File.WriteAllLines(destinationPath, File.ReadLines(sourcePath)
                                        .Where(line => previousLines.Add(line)));

This functions in pretty much the same way as LINQ's Distinct method, with one important difference: the output of Distinct isn't guaranteed to be in the same order as the input sequence. Using a HashSet<T> explicitly does provide this guarantee.

like image 117
LukeH Avatar answered Dec 17 '22 20:12

LukeH