Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Improve LINQ performance

Tags:

c#

.net

linq

I have a linq statement like this:

var records = from line in myfile 
              let data = line.Split(',')
              select new { a=int.Parse(data[0]), b=int.Parse(data[1]) };
var average = records.Sum(r => r.b)!=0?records.Sum(r => r.a) / records.Sum(r => r.b):0;

My question is: How many times records.Sum(r => r.b) is computed in the last line? Does LINQ loop over all the records each time when it needs to compute a sum (in this case, 3 Sum() so loop 3 times)? Or does it smartly loop over all the records just once andcompute all the sums?


Edit 1:

  1. I wonder if there is any way to improve it by only going through all the records just once (as we only need to do it in a single loop when use a plain for loop)?

  2. And there is really no need to load everything into memory before we can do the sum and average. Surely we can sum each element while loading it from the file. Is there any way to reduce the memory consumption as well?


Edit 2

Just to clarify a bit, I didn't use LINQ before I ended up like above. Using plain while/for loop can achieve all the performance requirements. But I then tried to improve the readability and also reduce the lines of code by using LINQ. It seems that we can't get both at the same time.

like image 464
james Avatar asked Dec 30 '15 14:12

james


People also ask

Is LINQ good for performance?

It is slightly slowerLINQ syntax is typically less efficient than a foreach loop. It's good to be aware of any performance tradeoff that might occur when you use LINQ to improve the readability of your code. And if you'd like to measure the performance difference, you can use a tool like BenchmarkDotNet to do so.

What is better than LINQ?

A stored procedure is the best way for writing complex queries as compared to LINQ. Deploying a LINQ based application is much easy and simple as compared to stored procedures based.

Is LINQ faster than for loop?

In general, for identical code, linq will be slower, because of the overhead of delegate invocation. You use an array to store the data. You use a for loop to access each element (as opposed to foreach or linq). Save this answer.


3 Answers

Twice, write it like this and it will be once:

var sum = records.Sum(r => r.b);

var avarage = sum != 0 ? records.Sum(r => r.a)/sum: 0;
like image 55
mybirthname Avatar answered Sep 28 '22 09:09

mybirthname


There are plenty of answers, but none that wrap all of your questions up.

How many times records.Sum(r => r.b) is computed in the last line?

Three times.

Does LINQ loop over all the records each time when it needs to compute a sum (in this case, 3 Sum() so loop 3 times)?

Yes.

Or does it smartly loop over all the records just once andcompute all the sums?

No.

I wonder if there is any way to improve it by only going through all the records just once (as we only need to do it in a single loop when use a plain for loop)?

You can do that, but it requires you to eagerly-load all the data which contradicts your next question.

And there is really no need to load everything into memory before we can do the sum and average. Surely we can sum each element while loading it from the file. Is there any way to reduce the memory consumption as well?

That's correct. In your original post you have a variable called myFile and you're iterating over it and putting it into a local variable called line (read: basically a foreach). Since you didn't show how you got your myFile data, I'm assuming that you're eagerly loading all the data.

Here's a quick example of lazy-loading your data:

public IEnumerable<string> GetData()
{
    using (var fileStream = File.OpenRead(@"C:\Temp\MyData.txt"))
    {
        using (var streamReader = new StreamReader(fileStream))
        {
            string line;
            while ((line = streamReader.ReadLine()) != null)
            {                       
                yield return line;
            }
        }
    }
}

public void CalculateSumAndAverage()
{
    var sumA = 0;
    var sumB = 0;
    var average = 0;

    foreach (var line in GetData())
    {
        var split = line.Split(',');
        var a = Convert.ToInt32(split[0]);
        var b = Convert.ToInt32(split[1]);

        sumA += a;
        sumB += b;
    }

    // I'm not a big fan of ternary operators,
    // but feel free to convert this if you so desire.
    if (sumB != 0)
    {
        average = sumA / sumB;
    }
    else 
    {
        // This else clause is redundant, but I converted it from a ternary operator.
        average = 0;
    }
}
like image 25
Cameron Avatar answered Sep 28 '22 08:09

Cameron


Three times, and what you should use here is Aggregate, not Sum.

// do your original selection
var records = from line in myfile 
              let data = line.Split(',')
              select new { a=int.Parse(data[0]), b=int.Parse(data[1]) };
// aggregate them into one record
var sumRec = records.Aggregate((runningSum, next) =>
          { 
            runningSum.a += next.a;
            runningSum.b += next.b;                
            return runningSum;
          });
// Calculate your average
var average = sumRec.b != 0 ? sumRec.a / sumRec.b : 0;
like image 34
flindeberg Avatar answered Sep 28 '22 09:09

flindeberg