Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

C# LINQ and calculations involving large datasets

This is more of a technical "how-to" or "best approach" question.

We have a current requirement to retrieve records from the database, place them into an 'in-memory' list and then perform a series of calculations on the data, i.e. maximum values, averages and some more specific custom statistics as well.

Getting the data into an 'in-memory' list is not a problem as we use NHibernate as our ORM and it does an excellent job of retrieving data from the database. The advice I am seeking is how should we best perform calculations on the resulting list of data.

Ideally I would like to create a method for each statistic, MaximumValue(), AverageValueUnder100(), MoreComplicatedStatistic() etc etc. Of course passing the required variables to each method and having it return the result. This approach would also make unit testing a breeze and provide us with excellent coverage.

Would there be a performance hit if we perform a LINQ query for each calculation or should be consolidate as many calls to each statistic method in as few LINQ queries as possible. For example it doesn't make much sense to pass the list of data to a method called AverageValueBelow100 and then pass the entire list of data to another method AverageValueBelow50 when they could effectively be performed with one LINQ query.

How can we achieve a high level of granularity and separation without sacrificing performance?

Any advice ... is the question clear enough?

like image 760
Rowen Avatar asked Nov 05 '22 15:11

Rowen


1 Answers

Depending on the complexity of the calculation, it may be best to do it in the database. If it is signifcantly complex that you need to bring it in as objects and encur that overhead, you may want to avoid multiple iterations over your result set. you may want to consider using Aggregate. See http://geekswithblogs.net/malisancube/archive/2009/12/09/demystifying-linq-aggregates.aspx for a discussion if it. You would be able to unit test each aggregate separately, but then (potentially) project multiple aggregates within a single iteration.

like image 138
Jim Wooley Avatar answered Nov 12 '22 20:11

Jim Wooley