Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Efficient way to remove duplicate strings from a string array in C#

I would like to know an efficient method to remove duplicate items from a string array in C#.

For example,

string[] a = { "abc", "xyz","abc", "def", "ghi", "asdf", "ghi","xd", "abc" };

will become,

string[] a = { "abc", "xyz","def", "ghi", "asdf", "xd" };

How to fill the gaps after removing the duplicate entries? Is there a way to do this without using an extra array for storing the elements?

Method which I used:

1) Sorted the array

2) Replaced the duplicate entries with null

3) Copied NOT null string to a new array.

But looking for an optimized way to doing the same.

EDIT: I am using .NET 2.0 and VS 2005

like image 584
SyncMaster Avatar asked Apr 11 '11 07:04

SyncMaster


People also ask

What is the method of removing duplicates without the remove duplicate stage?

There are multiple ways to remove duplicates other than using Remove Duplicates Stage. As stated above you can use Sort stage, Transformer stage. In sort stage, you can enable Key Change() column and it will be useful to filter the duplicate records. You can use Aggregator stage to remove duplicates.


2 Answers

You can use a HashSet:

string[] a = { "abc", "xyz","abc", "def", "ghi", "asdf", "ghi","xd", "abc" };
var b = new HashSet<string>(a);
like image 158
Ohad Schneider Avatar answered Sep 28 '22 05:09

Ohad Schneider


You can't resize an array in .NET, so whatever way you use to remove the duplicates, you have to create a new array for the result.

You can use a HashSet<string> to easily remove the duplicates:

a = new HashSet<string>(a).ToArray();

The hash set will add the items from the array to itself, and automatically discard the duplicates. As the hash set uses hash codes to check for existing items, this will be somewhat faster than sorting the items, however the result is of course not sorted.

like image 39
Guffa Avatar answered Sep 28 '22 06:09

Guffa