I would like to know an efficient method to remove duplicate items from a string array in C#.
For example,
string[] a = { "abc", "xyz","abc", "def", "ghi", "asdf", "ghi","xd", "abc" };
will become,
string[] a = { "abc", "xyz","def", "ghi", "asdf", "xd" };
How to fill the gaps after removing the duplicate entries? Is there a way to do this without using an extra array for storing the elements?
Method which I used:
1) Sorted the array
2) Replaced the duplicate entries with null
3) Copied NOT null string to a new array.
But looking for an optimized way to doing the same.
EDIT: I am using .NET 2.0 and VS 2005
There are multiple ways to remove duplicates other than using Remove Duplicates Stage. As stated above you can use Sort stage, Transformer stage. In sort stage, you can enable Key Change() column and it will be useful to filter the duplicate records. You can use Aggregator stage to remove duplicates.
You can use a HashSet:
string[] a = { "abc", "xyz","abc", "def", "ghi", "asdf", "ghi","xd", "abc" };
var b = new HashSet<string>(a);
You can't resize an array in .NET, so whatever way you use to remove the duplicates, you have to create a new array for the result.
You can use a HashSet<string>
to easily remove the duplicates:
a = new HashSet<string>(a).ToArray();
The hash set will add the items from the array to itself, and automatically discard the duplicates. As the hash set uses hash codes to check for existing items, this will be somewhat faster than sorting the items, however the result is of course not sorted.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With