I am working on a section of a project that uses large number of sum methods. These sum methods are applied on a Datatable
To test the best method, I use the following
Datatable structure
class LogParser
{
public DataTable PGLStat_Table = new DataTable();
public LogParser()
{
PGLStat_Table.Columns.Add("type", typeof(string));
PGLStat_Table.Columns.Add("desc", typeof(string));
PGLStat_Table.Columns.Add("count", typeof(int));
PGLStat_Table.Columns.Add("duration", typeof(decimal));
PGLStat_Table.Columns.Add("cper", typeof(decimal));
PGLStat_Table.Columns.Add("dper", typeof(decimal));
PGLStat_Table.Columns.Add("occurancedata", typeof(string));
}
}
Following method is used to Fill the table
LogParser pglp = new LogParser();
Random r2 = new Random();
for (int i = 1; i < 1000000; i++)
{
int c2 = r2.Next(1, 1000);
pglp.PGLStat_Table.Rows.Add("Type" + i.ToString(), "desc" + i , c2, 0, 0, 0, " ");
}
Following Methods used to calculate Sum
Method 1 using Compute
Stopwatch s2 = new Stopwatch();
s2.Start();
object sumObject;
sumObject = pglp.PGLStat_Table.Compute("Sum(count)", " ");
s2.Stop();
long d1 = s2.ElapsedMilliseconds;
Method 2 using Foreach loop
s2.Restart();
int totalcount = 0;
foreach (DataRow dr in pglp.PGLStat_Table.Rows)
{
int c = Convert.ToInt32(dr["count"].ToString());
totalcount = totalcount + c;
}
s2.Stop();
long d2 = s2.ElapsedMilliseconds;
Method 3 using Linq
s2.Restart();
var sum = pglp.PGLStat_Table.AsEnumerable().Sum(x => x.Field<int>("count"));
MessageBox.Show(sum.ToString());
s2.Stop();
long d3 = s2.ElapsedMilliseconds;
After Comparison the results are
a) foreach is the fastest 481ms
b) next is linq 1016ms
c) and then Compute 2253ms
Query 1
I accidentally change "c2 to i" in the following statement
pglp.PGLStat_Table.Rows.Add("Type" + i.ToString(), "desc" + i , i, 0, 0, 0, " ");
The Linq statement produces an error
Arithmetic operation resulted in an overflow.
Whereas the Compute and Foreach loop are still able to complete the computation although maybe incorrect.
Is such a behaviour cause of concern or am I missing a directive ? (also the figures computed are large)
Query 2
I was under the impression Linq does it fastest, is there a optimized method or parameter that makes it perform better.
thanks for advice
arvind
Fastest sum is next (with precompute DataColumn and direct cast to int):
static int Sum(LogParser pglp)
{
var column = pglp.PGLStat_Table.Columns["count"];
int totalcount = 0;
foreach (DataRow dr in pglp.PGLStat_Table.Rows)
{
totalcount += (int)dr[column];
}
return totalcount;
}
Statistic:
00:00:00.1442297, for/each, by column, (int)
00:00:00.1595430, for/each, by column, Field<int>
00:00:00.6961964, for/each, by name, Convert.ToInt
00:00:00.1959104, linq, cast<DataRow>, by column, (int)
Other code:
static int Sum_ForEach_ByColumn_Field(LogParser pglp)
{
var column = pglp.PGLStat_Table.Columns["count"];
int totalcount = 0;
foreach (DataRow dr in pglp.PGLStat_Table.Rows)
{
totalcount += dr.Field<int>(column);
}
return totalcount;
}
static int Sum_ForEach_ByName_Convert(LogParser pglp)
{
int totalcount = 0;
foreach (DataRow dr in pglp.PGLStat_Table.Rows)
{
int c = Convert.ToInt32(dr["count"].ToString());
totalcount = totalcount + c;
}
return totalcount;
}
static int Sum_Linq(LogParser pglp)
{
var column = pglp.PGLStat_Table.Columns["count"];
return pglp.PGLStat_Table.Rows.Cast<DataRow>().Sum(row => (int)row[column]);
}
var data = GenerateData();
Sum(data);
Sum_Linq2(data);
var count = 3;
foreach (var info in new[]
{
new {Name = "for/each, by column, (int)", Method = (Func<LogParser, int>)Sum},
new {Name = "for/each, by column, Field<int>", Method = (Func<LogParser, int>)Sum_ForEach_ByColumn_Field},
new {Name = "for/each, by name, Convert.ToInt", Method = (Func<LogParser, int>)Sum_ForEach_ByName_Convert},
new {Name = "linq, cast<DataRow>, by column, (int)", Method = (Func<LogParser, int>)Sum_Linq},
})
{
var watch = new Stopwatch();
for (var i = 0; i < count; ++i)
{
watch.Start();
var sum = info.Method(data);
watch.Stop();
}
Console.WriteLine("{0}, {1}", TimeSpan.FromTicks(watch.Elapsed.Ticks / count), info.Name);
}
well you could improve a bit on the linq example (AsEnumerable
) but this is expected behavior - Linq(2objects) cannot be faster as a loop (you could do even better by using a for(var i = ...)
loop instead of the foreach) - I guess what you meant to do was using Linq2Sql - then the aggregation (sum) will be done on the database and it should be faster - but as you don't seem to use database-data...
Query 1.
As you can see in documentation Enumerable.Sum extension method throws an OverflowException on integer overflow. DataTable.Compute has no such a functionality as well as integer operations you use in Method 2.
UPDATE:
Query 2.
I was under the impression Linq does it fastest, is there a optimized method or parameter that makes it perform better.
AFAIK, there is no method's to optimize array summation algorithm (without using parallel computing). Linq doubles the time used by foreach. So, I don't think that's about linq performance but compute inefficiency (note that there is an overhead for query string interpretation).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With