I have the following lists:
RakeSnapshots, ProductMovements
Aim is to process the both and get the count of elements that match a condition, as follows:
Consider RakeSnapshots
with StatusCode == "Dumping"
Consider ProductMovement
with Status == "InProgress"
count
of all elements both lists, which meet the condition RakeSnapshots.RakeCode
equal to ProductMovements.ProductCode
Following are my current options:
// Code 1:
var resultCount = ProductMovements.Where(x => RakeSnapshots
.Where(r => r.StatusCode == "Dumping")
.Any(y => y.RakeCode == x.ProductCode &&
x.Status == "InProgress"))
.Count();
// Code 2:
var productMovementsInprogress = ProductMovements.Where(x => x.Status == "InProgress");
var rakeSnapShotsDumping = RakeSnapshots.Where(r => r.StatusCode == "Dumping");
var resultCount = productMovementsInprogress.Zip(rakeSnapShotsDumping,(x,y) => (y.RakeCode == x.ProductCode) ? true : false)
.Where(x => x).Count();
Challenge is both the codes are O(n^2)
complexity, is there a way to improve it, this will hurt if the data is very large
You can use an inner join
to do this:
var dumpingRakeSnapshots = rakeSnapshots.Where(r => r.StatusCode == "Dumping");
var inProgressProductMovements = productMovements.Where(p => p.Status == "InProgress");
var matches =
from r in dumpingRakeSnapshots
join p in inProgressProductMovements on r.RakeCode equals p.ProductCode
select r;
int count = matches.Count(); // Here's the answer.
Note that (as Ivan Stoev points out) this only works if RakeCode is the primary key of RakeSnapshots
.
If it is not, you will have to use a grouped join
.
Here's the Linq query syntax version that you should use in that case, but note that this is exactly the same as Ivan's answer (only in Linq query form):
var matches =
from r in dumpingRakeSnapshots
join p in inProgressProductMovements on r.RakeCode equals p.ProductCode into gj
select gj;
For completeness, here's a compilable console app that demonstrates the different results you'll get if RakeCode
and ProductCode
are not primary keys:
using System;
using System.Collections.Generic;
using System.Linq;
namespace ConsoleApp1
{
class RakeSnapshot
{
public string StatusCode;
public string RakeCode;
}
class ProductMovement
{
public string Status;
public string ProductCode;
}
sealed class Program
{
void run()
{
var rakeSnapshots = new List<RakeSnapshot>
{
new RakeSnapshot {StatusCode = "Dumping", RakeCode = "1"},
new RakeSnapshot {StatusCode = "Dumping", RakeCode = "1"},
new RakeSnapshot {StatusCode = "Dumping", RakeCode = "2"}
};
var productMovements = new List<ProductMovement>
{
new ProductMovement {Status = "InProgress", ProductCode = "1"},
new ProductMovement {Status = "InProgress", ProductCode = "2"},
new ProductMovement {Status = "InProgress", ProductCode = "2"}
};
var dumpingRakeSnapshots = rakeSnapshots.Where(r => r.StatusCode == "Dumping");
var inProgressProductMovements = productMovements.Where(p => p.Status == "InProgress");
// Inner join.
var matches1 =
from r in dumpingRakeSnapshots
join p in inProgressProductMovements on r.RakeCode equals p.ProductCode
select r;
Console.WriteLine(matches1.Count());
// Grouped join.
var matches2 =
from r in dumpingRakeSnapshots
join p in inProgressProductMovements on r.RakeCode equals p.ProductCode into gj
select gj;
Console.WriteLine(matches2.Count());
// OP's code.
var resultCount =
productMovements
.Count(x => rakeSnapshots
.Where(r => r.StatusCode == "Dumping")
.Any(y => y.RakeCode == x.ProductCode && x.Status == "InProgress"));
Console.WriteLine(resultCount);
}
static void Main(string[] args)
{
new Program().run();
}
}
}
Sounds like Group Join which (as well as Join
) is the most efficient LINQ way of correlating two sets:
var resultCount = ProductMovements.Where(p => p.Status == "InProgress")
.GroupJoin(RakeSnapshots.Where(r => r.StatusCode == "Dumping"),
p => p.ProductCode, r => r.RakeCode, (p, match) => match)
.Count(match => match.Any());
The time complexity of the above is O(N+M).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With