I have this code, which works fine, but is slow on large datasets.
I'd like to hear from the experts if this code could benefit from using Linq, or another method, and if so, how?
Dim array_of_strings As String()
' now I add strings to my array, these come from external file(s).
' This does not take long
' Throughout the execution of my program, I need to validate millions
' of other strings.
Dim search_string As String
Dim indx As Integer
' So we get million of situation like this, where I need to find out
' where in the array I can find a duplicate of this exact string
search_string = "the_string_search_for"
indx = array_of_strings.ToList().IndexOf(search_string)
Each of the strings in my array are unique, no duplicates.
This works pretty well, but like I said, too slow for larger datasets. I am running this query millions of times. Currently it takes about 1 minute for a million queries but this is too slow to my liking.
There's no need to use Linq. If you used an indexed data structure like a dictionary, the search would be O(log n), at the cost of a slightly longer process of filling the structure. But you do that once, then do a million searches, you're going to come out ahead.
See the description of Dictionary at this site: https://msdn.microsoft.com/en-us/library/7y3x785f(v=vs.110).aspx
Since (I think) you're talking about a collection that is its own key, you could save some memory by using SortedSet<T>
https://msdn.microsoft.com/en-us/library/dd412070(v=vs.110).aspx
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With