Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Improve the time complexity of current Linq queries

I have the following lists:

RakeSnapshots, ProductMovements

Aim is to process the both and get the count of elements that match a condition, as follows:

  • Consider RakeSnapshots with StatusCode == "Dumping"

  • Consider ProductMovement with Status == "InProgress"

  • Fetch the count of all elements both lists, which meet the condition RakeSnapshots.RakeCode equal to ProductMovements.ProductCode

Following are my current options:

// Code 1:

 var resultCount =  ProductMovements.Where(x => RakeSnapshots
                                                .Where(r => r.StatusCode == "Dumping")
                                                .Any(y => y.RakeCode == x.ProductCode  && 
                                                          x.Status == "InProgress"))
                                                .Count();

// Code 2:

var productMovementsInprogress = ProductMovements.Where(x => x.Status == "InProgress");

var rakeSnapShotsDumping = RakeSnapshots.Where(r => r.StatusCode == "Dumping");

var resultCount = productMovementsInprogress.Zip(rakeSnapShotsDumping,(x,y) => (y.RakeCode == x.ProductCode) ?  true : false)
                                            .Where(x => x).Count();

Challenge is both the codes are O(n^2) complexity, is there a way to improve it, this will hurt if the data is very large

like image 917
Mrinal Kamboj Avatar asked Feb 06 '23 07:02

Mrinal Kamboj


2 Answers

You can use an inner join to do this:

var dumpingRakeSnapshots       = rakeSnapshots.Where(r => r.StatusCode == "Dumping");
var inProgressProductMovements = productMovements.Where(p => p.Status == "InProgress");

var matches =
    from r in dumpingRakeSnapshots
    join p in inProgressProductMovements on r.RakeCode equals p.ProductCode
    select r;

int count = matches.Count(); // Here's the answer.

Note that (as Ivan Stoev points out) this only works if RakeCode is the primary key of RakeSnapshots.

If it is not, you will have to use a grouped join.

Here's the Linq query syntax version that you should use in that case, but note that this is exactly the same as Ivan's answer (only in Linq query form):

var matches =
    from r in dumpingRakeSnapshots
    join p in inProgressProductMovements on r.RakeCode equals p.ProductCode into gj
    select gj;

For completeness, here's a compilable console app that demonstrates the different results you'll get if RakeCode and ProductCode are not primary keys:

using System;
using System.Collections.Generic;
using System.Linq;

namespace ConsoleApp1
{
    class RakeSnapshot
    {
        public string StatusCode;
        public string RakeCode;
    }

    class ProductMovement
    {
        public string Status;
        public string ProductCode;
    }

    sealed class Program
    {
        void run()
        {
            var rakeSnapshots = new List<RakeSnapshot>
            {
                new RakeSnapshot {StatusCode = "Dumping", RakeCode = "1"},
                new RakeSnapshot {StatusCode = "Dumping", RakeCode = "1"},
                new RakeSnapshot {StatusCode = "Dumping", RakeCode = "2"}
            };

            var productMovements = new List<ProductMovement>
            {
                new ProductMovement {Status = "InProgress", ProductCode = "1"},
                new ProductMovement {Status = "InProgress", ProductCode = "2"},
                new ProductMovement {Status = "InProgress", ProductCode = "2"}
            };

            var dumpingRakeSnapshots       = rakeSnapshots.Where(r => r.StatusCode == "Dumping");
            var inProgressProductMovements = productMovements.Where(p => p.Status == "InProgress");

            // Inner join.

            var matches1 =
                from r in dumpingRakeSnapshots
                join p in inProgressProductMovements on r.RakeCode equals p.ProductCode
                select r;

            Console.WriteLine(matches1.Count());

            // Grouped join.

            var matches2 =
                from r in dumpingRakeSnapshots
                join p in inProgressProductMovements on r.RakeCode equals p.ProductCode into gj
                select gj;

            Console.WriteLine(matches2.Count());

            // OP's code.

            var resultCount = 
                productMovements
                .Count(x => rakeSnapshots
                .Where(r => r.StatusCode == "Dumping")
                .Any(y => y.RakeCode == x.ProductCode && x.Status == "InProgress"));

            Console.WriteLine(resultCount);
        }

        static void Main(string[] args)
        {
            new Program().run();
        }
    }
}
like image 155
Matthew Watson Avatar answered Feb 08 '23 23:02

Matthew Watson


Sounds like Group Join which (as well as Join) is the most efficient LINQ way of correlating two sets:

var resultCount = ProductMovements.Where(p => p.Status == "InProgress")
    .GroupJoin(RakeSnapshots.Where(r => r.StatusCode == "Dumping"), 
        p => p.ProductCode, r => r.RakeCode, (p, match) => match)
    .Count(match => match.Any());

The time complexity of the above is O(N+M).

like image 34
Ivan Stoev Avatar answered Feb 09 '23 00:02

Ivan Stoev