My C# program generates random strings from a given pattern. These strings are stored in a list. As no duplicates are allowed I'm doing it like this:
List<string> myList = new List<string>(); for (int i = 0; i < total; i++) { string random_string = GetRandomString(pattern); if (!myList.Contains(random_string)) myList.Add(random_string); }
As you can imagine this works fine for several hundreds of entries. But I'm facing the situation to generate several million strings. And with each added string checking for duplicates gets slower and slower.
Are there any faster ways to avoid duplicates?
If you don't want duplicates, use a Set instead of a List . To convert a List to a Set you can use the following code: // list is some List of Strings Set<String> s = new HashSet<String>(list); If really necessary you can use the same construction to convert a Set back into a List .
To remove the duplicates from a list, you can make use of the built-in function set(). The specialty of the set() method is that it returns distinct elements.
In C# programming, collections like ArrayList, List , simply adds values in it without checking any duplication. To avoid such a duplicate data store, .
Use a data structure that can much more efficiently determine if an item exists, namely a HashSet
. It can determine if an item is in the set in constant time, regardless of the number of items in the set.
If you really need the items in a List
instead, or you need the items in the resulting list to be in the order they were generated, then you can store the data in both a list and a hashset; adding the item to both collections if it doesn't currently exist in the HashSet
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With