Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Design advice. Using DataTable or List<MyObject> for a generic rule checker

I have about 100,000 lines of generic data. Columns/Properties of this data are user definable and are of the usual data types (string, int, double, date). There will be about 50 columns/properties.

I have 2 needs:

  • To be able to calculate new columns/properties using an expression
    e.g. Column3 = Column1 * Column2.
    Ultimately I would like to be able to use external data using a callback,
    e.g. Column3 = Column1 * GetTemperature
    The expression is relatively simple, maths operations, sum, count & IF are the only necessary functions.
  • To be able to filter/group the data and perform aggregations
    e.g. Sum(Data.Column1) Where(Data.Column2 == "blah")

    As far as I can see I have two options:
    1. Using a DataTable.
    => Point 1 above is achieved by using DataColumn.Expression
    => Point 2 above is acheived by using DataTable.DefaultView.RowFilter or DataTable.Select() & C# code

    2. Using a List of generic Objects each with a Dictionary< string, object > to store the values.
    => Point 1 could be achieved by something like NCalc
    => Point 2 is achieved using LINQ
    DataTable:
    Pros: DataColumn.Expression is inbuilt
    Cons: RowFilter & coding c# is not as "nice" as LINQ, 
          DataColumn.Expression does not support callbacks(?) 
          => workaround could be to get & replace external value when creating 
             the calculated column
    
    GenericList:
    Pros: LINQ syntax, NCalc supports callbacks
    Cons: Implementing NCalc/generic calc engine
    

    Based on the above I would think a GenericList approach would win, but something I have not factored in is the performance which for some reason I think would be better with a datatable.
    Does anyone have a gut feeling / experience with LINQ vs. DataTable performance?
    How about NCalc?
    As I said there are about 100,000 rows of data, with 50 columns, of which maybe 20 are calculated. In total about 50 rules will be run against the data, so in total there will be 5 million row/object scans.

    Would really appreciate any insights. Thx.
    ps. Of course using a database + SQL & Views etc. would be the easiest solution, but for various reasons can't be implemented.

  • like image 257
    Andrew White Avatar asked Jun 07 '10 10:06

    Andrew White


    1 Answers

    Well, using DataTable does not preclude use of LINQ

    table.Rows.Cast<DataRow>() //IEnumerable<DataRow>, linq it to death
    

    This guy makes some arguments about HashTable vs. DataTable and this guy finds Dictionary better than DataTable, but not by much (factors in Dictionary create cost).

    Note: if the columns are known beforehand (that is, a user may select some of the columns from a predefined set of columns(name, type)), I would go with strongly typed classes, since data["property"] does not get Intellisense support like data.Property does.

    like image 89
    bottlenecked Avatar answered Oct 24 '22 10:10

    bottlenecked