Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Comparing Sum Methods in C#

I am working on a section of a project that uses large number of sum methods. These sum methods are applied on a Datatable

To test the best method, I use the following

Datatable structure

class LogParser
{
     public DataTable PGLStat_Table = new DataTable();
     public LogParser()
     {
         PGLStat_Table.Columns.Add("type", typeof(string)); 
         PGLStat_Table.Columns.Add("desc", typeof(string)); 
         PGLStat_Table.Columns.Add("count", typeof(int));
         PGLStat_Table.Columns.Add("duration", typeof(decimal));
         PGLStat_Table.Columns.Add("cper", typeof(decimal));
         PGLStat_Table.Columns.Add("dper", typeof(decimal));
         PGLStat_Table.Columns.Add("occurancedata", typeof(string));  
     }       
}

Following method is used to Fill the table

LogParser pglp = new LogParser();
Random r2 = new Random();
for (int i = 1; i < 1000000; i++)
{
    int c2 = r2.Next(1, 1000);
    pglp.PGLStat_Table.Rows.Add("Type" + i.ToString(), "desc" + i , c2, 0, 0, 0, " ");
}
  • Sum is applied on count column, where value of c2 is updated

Following Methods used to calculate Sum

Method 1 using Compute

Stopwatch s2 = new Stopwatch();
s2.Start();
object sumObject;
sumObject = pglp.PGLStat_Table.Compute("Sum(count)", " ");
s2.Stop();
long d1 = s2.ElapsedMilliseconds;

Method 2 using Foreach loop

s2.Restart();
int totalcount = 0;
foreach (DataRow dr in pglp.PGLStat_Table.Rows)
{
   int c = Convert.ToInt32(dr["count"].ToString());
   totalcount = totalcount + c;
}
s2.Stop();
long d2 = s2.ElapsedMilliseconds;

Method 3 using Linq

s2.Restart();
var sum = pglp.PGLStat_Table.AsEnumerable().Sum(x => x.Field<int>("count"));
MessageBox.Show(sum.ToString());
s2.Stop();
long d3 = s2.ElapsedMilliseconds;

After Comparison the results are

a) foreach is the fastest 481ms

b) next is linq 1016ms

c) and then Compute 2253ms


Query 1

I accidentally change "c2 to i" in the following statement

 pglp.PGLStat_Table.Rows.Add("Type" + i.ToString(), "desc" + i , i, 0, 0, 0, " ");

The Linq statement produces an error

Arithmetic operation resulted in an overflow.

Whereas the Compute and Foreach loop are still able to complete the computation although maybe incorrect.

Is such a behaviour cause of concern or am I missing a directive ? (also the figures computed are large)

Query 2

I was under the impression Linq does it fastest, is there a optimized method or parameter that makes it perform better.

thanks for advice

arvind

like image 213
arvind Avatar asked Apr 03 '12 05:04

arvind


3 Answers

Fastest sum is next (with precompute DataColumn and direct cast to int):

  static int Sum(LogParser pglp)
  {
    var column = pglp.PGLStat_Table.Columns["count"];
    int totalcount = 0;
    foreach (DataRow dr in pglp.PGLStat_Table.Rows)
    {
      totalcount += (int)dr[column];
    }
    return totalcount;
  }

Statistic:

00:00:00.1442297, for/each, by column, (int)
00:00:00.1595430, for/each, by column, Field<int>
00:00:00.6961964, for/each, by name, Convert.ToInt
00:00:00.1959104, linq, cast<DataRow>, by column, (int)

Other code:

  static int Sum_ForEach_ByColumn_Field(LogParser pglp)
  {
    var column = pglp.PGLStat_Table.Columns["count"];
    int totalcount = 0;
    foreach (DataRow dr in pglp.PGLStat_Table.Rows)
    {
      totalcount += dr.Field<int>(column);
    }
    return totalcount;
  }
  static int Sum_ForEach_ByName_Convert(LogParser pglp)
  {
    int totalcount = 0;
    foreach (DataRow dr in pglp.PGLStat_Table.Rows)
    {
      int c = Convert.ToInt32(dr["count"].ToString());
      totalcount = totalcount + c;
    }
    return totalcount;
  }
  static int Sum_Linq(LogParser pglp)
  {
    var column = pglp.PGLStat_Table.Columns["count"];
    return pglp.PGLStat_Table.Rows.Cast<DataRow>().Sum(row => (int)row[column]);
  }


    var data = GenerateData();
    Sum(data);
    Sum_Linq2(data);
    var count = 3;
    foreach (var info in new[]
      {
        new {Name = "for/each, by column, (int)", Method = (Func<LogParser, int>)Sum},
        new {Name = "for/each, by column, Field<int>", Method = (Func<LogParser, int>)Sum_ForEach_ByColumn_Field},
        new {Name = "for/each, by name, Convert.ToInt", Method = (Func<LogParser, int>)Sum_ForEach_ByName_Convert},
        new {Name = "linq, cast<DataRow>, by column, (int)", Method = (Func<LogParser, int>)Sum_Linq},
      })
    {
      var watch = new Stopwatch();
      for (var i = 0; i < count; ++i)
      {
        watch.Start();
        var sum = info.Method(data);
        watch.Stop();
      }
      Console.WriteLine("{0}, {1}", TimeSpan.FromTicks(watch.Elapsed.Ticks / count), info.Name);
    }
like image 131
Serj-Tm Avatar answered Oct 19 '22 01:10

Serj-Tm


well you could improve a bit on the linq example (AsEnumerable) but this is expected behavior - Linq(2objects) cannot be faster as a loop (you could do even better by using a for(var i = ...) loop instead of the foreach) - I guess what you meant to do was using Linq2Sql - then the aggregation (sum) will be done on the database and it should be faster - but as you don't seem to use database-data...

like image 35
Random Dev Avatar answered Oct 18 '22 23:10

Random Dev


Query 1.

As you can see in documentation Enumerable.Sum extension method throws an OverflowException on integer overflow. DataTable.Compute has no such a functionality as well as integer operations you use in Method 2.


UPDATE: Query 2.

I was under the impression Linq does it fastest, is there a optimized method or parameter that makes it perform better.

AFAIK, there is no method's to optimize array summation algorithm (without using parallel computing). Linq doubles the time used by foreach. So, I don't think that's about linq performance but compute inefficiency (note that there is an overhead for query string interpretation).

like image 20
default locale Avatar answered Oct 18 '22 23:10

default locale