Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to construct a having statement in linq with a max value?

I have a collection of objects containing transactional data. This dataset has a 3 level parent-child relationship with the levels being: Project > Stage > Transaction. For each project and stage there will be multiple transactions.

class CostElement
{
    string ProjectNumber { get; set; }
    string Stage { get; set; }
    string DerivedCostElement { get; set; }
    DateTime? Date { get; set; }
    decimal? Amount { get; set; }
}

What I want to find is a unique list of cost elements (DerivedCostElement), with a count of the maximum number of times it is used in any given single project. I.e. Count the number of stages it appears in for each project and then select the maximum value.

What I have tried so far is:

//count all cost elements per project
var aggregateQuery = from ce in _costElements
                     group ce by new { ce.ProjectNumber, ce.Stage, ce.DerivedCostElement }
                     into g
                     select new
                     {
                         g.Key.DerivedCostElement,
                         g.Key.ProjectNumber,
                         g.Key.Stage
                     };

//select cost elements having max count
var countQuery = from g in aggregateQuery
                 group g by new { g.DerivedCostElement, g.ProjectNumber }
                 into grp
                 select new CostElementCount
                 {
                     CostElement = grp.Key.DerivedCostElement,
                     ProjectNumber = grp.Key.ProjectNumber,
                     Count = grp.Count()
                 };

return countQuery.ToList();

This for the most part works, I end up with a list of cost elements per project and the maximum number of occurrences per project. However the results are still per project, what I want is a unique list with only the maximum value.

Can someone please help me obtain the correct result? Also any suggestions on how to write this more efficiently or succinctly would be greatly appreciated.

like image 355
codemonkeh Avatar asked Mar 21 '23 16:03

codemonkeh


1 Answers

I'm not sure about the writing the first two queries you have more efficiently, but I think adding the following line before your return statement would get the results you want.

countQuery = countQuery.OrderByDescending(x => x.Count).GroupBy(x => x.CostElement).Select(g => g.First());

Or to keep the syntax more consistent

countQuery = from cq in countQuery
             orderby cq.Count descending
             group cq by cq.CostElement
                 into grp
             select grp.First();

Here is the test data I used...

_costElements = new List<CostElement>();
_costElements.Add(new CostElement() { ProjectNumber = "a", Stage = "a", DerivedCostElement = "ce6" });
_costElements.Add(new CostElement() { ProjectNumber = "a", Stage = "b", DerivedCostElement = "ce1" });
_costElements.Add(new CostElement() { ProjectNumber = "a", Stage = "x", DerivedCostElement = "ce1" });
_costElements.Add(new CostElement() { ProjectNumber = "a", Stage = "b", DerivedCostElement = "ce2" });
_costElements.Add(new CostElement() { ProjectNumber = "a", Stage = "b", DerivedCostElement = "ce3" });
_costElements.Add(new CostElement() { ProjectNumber = "a", Stage = "c", DerivedCostElement = "ce3" });
_costElements.Add(new CostElement() { ProjectNumber = "a", Stage = "d", DerivedCostElement = "ce3" });
_costElements.Add(new CostElement() { ProjectNumber = "b", Stage = "e", DerivedCostElement = "ce1" });
_costElements.Add(new CostElement() { ProjectNumber = "b", Stage = "f", DerivedCostElement = "ce1" });
_costElements.Add(new CostElement() { ProjectNumber = "c", Stage = "g", DerivedCostElement = "ce1" });
_costElements.Add(new CostElement() { ProjectNumber = "c", Stage = "h", DerivedCostElement = "ce1" });
_costElements.Add(new CostElement() { ProjectNumber = "c", Stage = "h", DerivedCostElement = "ce2" });
_costElements.Add(new CostElement() { ProjectNumber = "c", Stage = "i", DerivedCostElement = "ce2" });
_costElements.Add(new CostElement() { ProjectNumber = "d", Stage = "j", DerivedCostElement = "ce2" });
_costElements.Add(new CostElement() { ProjectNumber = "d", Stage = "k", DerivedCostElement = "ce1" });
_costElements.Add(new CostElement() { ProjectNumber = "d", Stage = "l", DerivedCostElement = "ce1" });
_costElements.Add(new CostElement() { ProjectNumber = "d", Stage = "m", DerivedCostElement = "ce1" });

And the results I came up with (with a bit of reordering of the countQuery results) were...

CostElement: ce1 ProjectNumber: d Count: 3

CostElement: ce2 ProjectNumber: c Count: 2

CostElement: ce3 ProjectNumber: a Count: 3

CostElement: ce6 ProjectNumber: a Count: 1

Which I think is a list of all cost elements, along with the project and count where each of them appears most. That output came from a ToString() I added to CostElementCount

public string ToString()
{
    return string.Format("CostElement: {0} ProjectNumber: {1} Count: {2}", CostElement, ProjectNumber, Count);
}
like image 62
S. Baggy Avatar answered Apr 25 '23 17:04

S. Baggy