Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

t-sql GROUP BY with COUNT, and then include MAX from the COUNT

Suppose you had a table of "Cars" with hundreds of thousands of rows, and you wanted to do a GROUP BY:

SELECT   CarID
         , CarName
         , COUNT(*) AS Total
FROM     dbo.tbl_Cars
GROUP BY CarID
         , CarName

The grouping leaves you with a result akin to:

CarID       CarName    Total
1872        Olds       202,121   
547841      BMW        175,298
9877        Ford        10,241

All fine and well. My question, though, is what is the best way to get the Total and the MAX Total into one table, in terms of performance and clean coding, so you have a result like:

CarID       CarName    Total      Max Total
1872        Olds       202,121    202,121
547841      BMW        175,298    202,121
9877        Ford        10,241    202,121 

One approach would be to put the GROUP result into a temp table, and then get the MAX from the temp table into a local variable. But I'm wondering what the best way to do this would be.


UPDATE

The Common Table Expression seems the most elegant to write, yet similar to @EBarr, my limited testing indicates a significantly slower performance. So I won't be going with the CTE.

As the link @EBarr has for the COMPUTE option indicates the feature is deprecated, that doesn't seem the best route, either.

The option of a local variable for the MAX value and the use of a temp table will likely be the route I go down, as I'm not aware of performance issues with it.

A bit more detail about my use case: it could probably end up being a series of other SO questions. But suffice to say that I'm loading a large subset of data into a temp table (so a subset of tbl_Cars is going into #tbl_Cars, and even #tbl_Cars may be further filtered and have aggregations performed on it), because I have to perform multiple filtering and aggregation queries on it within a single stored proc that returns multiple result sets.


UPDATE 2

@EBarr's use of a windowed function is nice and short. Note to self: if using a RIGHT JOIN to an outer reference table, the COUNT() function should select a column from tbl_Cars, not '*'.

SELECT       M.MachineID
             , M.MachineType
             , COUNT(C.CarID) AS Total
             , MAX(COUNT(C.CarID)) OVER() as MaxTotal
FROM         dbo.tbl_Cars C
RIGHT JOIN   dbo.tbl_Machines M
      ON     C.CarID = M.CarID
GROUP BY     M.MachineID
             , M.MachineType

In terms of speed, it seems fine, but at what point do you have to be worried about the number of reads?

like image 798
mg1075 Avatar asked Feb 09 '12 19:02

mg1075


1 Answers

Mechanically there are a few ways to do this. You could use temp tables/table variable. Another way is with nested queries and/or a CTE as @Aaron_Bertrand showed. A third way is to use WINDOWED FUNCTIONS such as...

SELECT    CarName,
          COUNT(*) as theCount,
          MAX(Count(*)) OVER(PARTITION BY 'foo') as MaxPerGroup
FROM      dbo.tbl_Cars
GROUP BY CarName

A DISFAVORED (read depricated) fourth way is using the COMPUTE keyword as such...

SELECT   CarID, CarName, Count(*)
FROM     dbo.tbl_Cars
GROUP BY CarID, CarName 
COMPUTE MAX(Count(*))   

The COMPUTE keyword generates totals that appear as additional summary columns at the end of the result set (see this). In the query above you will actually see two record sets.

Fastest

Now, the next issue is what's the "best/fastest/easiest." I immediately think of an indexed view. As @Aaron gently reminded me, indexed views have all sorts of restrictions. The above, strategy, however, allows you to create an indexed view on the SELECT...FROM..GROUP BY. Then selecting from the indexed view apply the WINDOWED FUNCTION clause.

Without knowing more, however, about your design it is going to be difficult for anyone tell you what's best. You will get lighting fast queries from an indexed view. That performance comes at a price, though. The price is maintenance costs. If the underlying table is the target of a large amount of insert/update/delete operations the maintenance of the indexed view will bog down performance in other areas.

If you share a bit more about your use case and data access patterns people will be able to share more insight.


MICRO PERFORMANCE TEST

So I generated a little data script and looked at sql profiler numbers for the CTE performance vs windowed functions. This is a micro-test, so try some real numbers in your system under real load.

Data generation:

Create table Cars ( CarID int identity (1,1) primary key, 
                    CarName varchar(20), 
                    value int)
GO
insert into Cars (CarName, value)
values  ('Buick', 100),
        ('Ford', 10),
        ('Buick', 300),     
        ('Buick', 100),
        ('Pontiac', 300),       
        ('Bmw', 100),
        ('Mecedes', 300),       
        ('Chevy', 300),     
        ('Buick', 100),
        ('Ford', 200);
GO 1000

This script generates 10,000 rows. I then ran each of the four following queries multiple times :

--just group by
select  CarName,COUNT(*) countThis
FROM    Cars
GROUP BY CarName        

--group by with compute (BAD BAD DEVELOPER!)
select  CarName,COUNT(*) countThis
FROM    Cars
GROUP BY CarName        
COMPUTE  MAX(Count(*));

-- windowed aggregates...
SELECT  CarName,
        COUNT(*) as theCount,
        MAX(Count(*)) OVER(PARTITION BY 'foo') as MaxInAnyGroup
FROM Cars
GROUP BY CarName        

--CTE version
;WITH x AS (
  SELECT   CarName,
           COUNT(*) AS Total
  FROM     Cars
  GROUP BY CarName
)
SELECT x.CarName, x.Total, x2.[Max Total]
FROM x CROSS JOIN (
  SELECT [Max Total] = MAX(Total) FROM x
) AS x2;

After running the above queries, I created an indexed view on the "just group by" query above. Then I ran a query on the indexed view that performed a MAX(Count(*)) OVER(PARTITION BY 'foo'.

AVERAGE RESULTS

Query                      CPU       Reads     Duration   
--------------------------------------------------------
Group By                   15        31        7 ms  
Group & Compute            15        31        7 ms
Windowed Functions         14        56        8 ms 
Common Table Exp.          16        62       15 ms
Windowed on Indexed View    0        24        0 ms

Obviously this is a micro-benchmark and only mildly instructive, so take it for what it's worth.

like image 51
EBarr Avatar answered Sep 18 '22 13:09

EBarr