Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Speed of DataSet row/column lookups?

Recently I had to do some very processing heavy stuff with data stored in a DataSet. It was heavy enough that I ended up using a tool to help identify some bottlenecks in my code. When I was analyzing the bottlenecks, I noticed that although DataSet lookups were not terribly slow (they weren't the bottleneck), it was slower than I expected. I always assumed that DataSets used some sort of HashTable style implementation which would make lookups O(1) (or at least thats what I think HashTables are). The speed of my lookups seemed to be significantly slower than this.

I was wondering if anyone who knows anything about the implementation of .NET's DataSet class would care to share what they know.

If I do something like this :

DataTable dt = new DataTable();
if(dt.Columns.Contains("SomeColumn"))
{
    object o = dt.Rows[0]["SomeColumn"];
}

How fast would the lookup time be for the Contains(...) method, and for retrieving the value to store in Object o? I would have thought it be very fast like a HashTable (assuming what I understand about HashTables is correct) but it doesn't seem like it...

I wrote that code from memory so some things may not be "syntactically correct".

like image 681
Dan Herbert Avatar asked Sep 28 '08 00:09

Dan Herbert


2 Answers

Actually it's advisable to use integer when referencing column, which can improve a lot in terms of performance. To keep things manageable, you could declare constant integer. So instead of what you did, you could do

const int SomeTable_SomeColumn = 0;

DataTable dt = new DataTable();
if(dt.Columns.Contains(SomeTable_SomeColumn))
{
    object o = dt.Rows[0][SomeTable_SomeColumn];
}
like image 76
faulty Avatar answered Nov 11 '22 07:11

faulty


Via Reflector the steps for DataRow["ColumnName"] are:

  1. Get the DataColumn from ColumnName. Uses the row's DataColumnCollection["ColumnName"]. Internally, DataColumnCollection stores its DataColumns in a Hastable. O(1)
  2. Get the DataRow's row index. The index is stored in an internal member. O(1)
  3. Get the DataColumn's value at the index using DataColumn[index]. DataColumn stores its data in a System.Data.Common.DataStorage (internal, abstract) member:

    return dataColumnInstance._storage.Get(recordIndex);

    A sample concrete implementation is System.Data.Common.StringStorage (internal, sealed). StringStorage (and the other concrete DataStorages I checked) store their values in an array. Get(recordIndex) simply grabs the object in the value array at the recordIndex. O(1)

So overall you're O(1) but that doesn't mean the hashing and function calling during the operation is without cost. It just means it doesn't cost more as the number of DataRows or DataColumns increases.

Interesting that DataStorage uses an array for values. Can't imagine that's easy to rebuild when you add or remove rows.

like image 2
Corbin March Avatar answered Nov 11 '22 08:11

Corbin March