Originally i wanted to ask for the fastest way to query a Datatable for a special row.
I have tested 5 different methods for their performance with a surprising(for me) result.
Background: I have created a View in a MS Sql-Server 2005 Database. This view has current a total count of 6318 rows. Because i must check very often if a given id exists in this view, i wondered what is the most efficient way to do. I created a DataAdapter in a strongly typed dataset which returns all rows and fills a Datatable. My first approach was to create a shared generic List(of Int32) and fill it with the ID's from the view once on Application start. Then use List.Contains to check if the current ID is in this List. Because all rows are distinct i wondered if its not faster to use a SortedList and its ContainsKey-metod. Then i checked also the performance of direct access to the Datable with its Select-Method, its automatically generated(when column is defined as primary-key) FindBy-method and last but not least the DatarowCollection.Contains-Method. So i have 5 Methods to check if my ID is in that View(or mapped List/SortedList).
I measured their performance with the System.Diagnostics.StopWatch and got some interesting results. I thought the SortedList.ContainsKey must be faster than the List.Contains because they're distinct and sorted but the opposite is true. But the most surprising for me was that the DataRowCollection.Contains-Method(that i first had forgotten) is by far the fastest. It is even 50 times faster than the dataTable.FindBy-method.
Results [for 1000000 iterations*]
Timespan 5 = DataRowCollection.Contains = Ø 0.00638 [1202.79735] ms
1.)
Timespan 1: 0,6913 ms
Timespan 2: 0,1053 ms
Timespan 3: 0,3279 ms
Timespan 4: 0,1002 ms
Timespan 5: 0,0056 ms
2.)
Timespan 1: 0,6405 ms
Timespan 2: 0,0588 ms
Timespan 3: 0,3112 ms
Timespan 4: 0,3872 ms
Timespan 5: 0,0067 ms
3.)
Timespan 1: 0,6502 ms
Timespan 2: 0,0588 ms
Timespan 3: 0,3092 ms
Timespan 4: 0,1268 ms
Timespan 5: 0,007 ms
4.)
Timespan 1: 0,6504 ms
Timespan 2: 0,0586 ms
Timespan 3: 0,3092 ms
Timespan 4: 0,3893 ms
Timespan 5: 0,0063 ms
5.)
Timespan 1: 0,6493 ms
Timespan 2: 0,0586 ms
Timespan 3: 0,3215 ms
Timespan 4: 0,386 ms
Timespan 5: 0,0063 ms
Timespan 1: 0,6913 0,6405 0,6502 0,6504 0,6493 = Ø 0,65634
Timespan 2: 0,1053 0,0588 0,0588 0,0586 0,0586 = Ø 0,06802
Timespan 3: 0,3279 0,3112 0,3092 0,3092 0,3215 = Ø 0,31580
Timespan 4: 0,1002 0,3872 0,1268 0,3893 0,3860 = Ø 0,27790
Timespan 5: 0,0056 0,0067 0,0070 0,0063 0,0063 = Ø 0,00638
And for the sake of completeness part of the VB.Net source:
Dim applies As Boolean
Dim clock As New System.Diagnostics.Stopwatch
clock.Start()
For i As Int32 = 1 To 1000000
applies = sortedListAC17NextClaims.ContainsKey(myClaim.idData)
Next
clock.Stop()
Dim timeSpan1 As String = "Timespan 1: " & clock.Elapsed.TotalMilliseconds.ToString & " ms"
clock.Reset()
clock.Start()
For i As Int32 = 1 To 1000000
applies = listAC17NextClaims.Contains(myClaim.idData)
Next
clock.Stop()
Dim timeSpan2 As String = "Timespan 2: " & clock.Elapsed.TotalMilliseconds.ToString & " ms"
clock.Reset()
clock.Start()
For i As Int32 = 1 To 1000000
applies = Not MyDS.AC17NextClaims.FindByIdData(myClaim.idData) Is Nothing
Next
clock.Stop()
Dim timeSpan3 As String = "Timespan 3: " & clock.Elapsed.TotalMilliseconds.ToString & " ms"
clock.Reset()
clock.Start()
For i As Int32 = 1 To 1000000
applies = MyDS.AC17NextClaims.Select("idData=" & myClaim.idData).Length > 0
Next
clock.Stop()
Dim timeSpan4 As String = "Timespan 4: " & clock.Elapsed.TotalMilliseconds.ToString & " ms"
clock.Reset()
clock.Start()
For i As Int32 = 1 To 1000000
applies = MyDS.AC17NextClaims.Rows.Contains(myClaim.idData)
Next
clock.Stop()
Dim timeSpan5 As String = "Timespan 5: " & clock.Elapsed.TotalMilliseconds.ToString & " ms"
UPDATE: I have changed my results and the source above. In squared brackets are the values for 1000000 iterations. Now the result is completely different. The fastest method now is definitely is the ContainsKey of the SortedList.
UPDATE 2: I forgot the alternative to use List.BinarySearch. This seems to be fastest for me:
clock.Start()
For i As Int32 = 1 To 1000000
applies = listAC17NextClaims.BinarySearch(myClaim.idData) > -1
Next
clock.Stop()
needs only 219.1805 ms to perform 1000000 iterations and hence is the fastest without the overhead of a SortedList-KeyValue-Pair. I can use it without to sort the list because the DataAdapter filled the datatable with an Order By clause.
Why don't you use a collection that has a HashTable as the underlying data structure (Dictionary<TKey, TValue>
or HashSet<T>
)? HashTables
should provide O(1)
look up time if there are no collisions in keys (as you've stated) and doesn't need the overhead to "sort".
EDIT: If you only want to store the keys, you should use HashSet<T> which is available in .NET 3.5 and above.
From MSDN on SortedList:
Operations on a SortedList object tend to be slower than operations on a Hashtable object because of the sorting.
To target .NET 2.0, you could roll your own or use a pre-built one like Wintellect's Power Collections (you could easily just use the source as well).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With