Suppose you had a table of "Cars" with hundreds of thousands of rows, and you wanted to do a GROUP BY:
SELECT CarID
, CarName
, COUNT(*) AS Total
FROM dbo.tbl_Cars
GROUP BY CarID
, CarName
The grouping leaves you with a result akin to:
CarID CarName Total
1872 Olds 202,121
547841 BMW 175,298
9877 Ford 10,241
All fine and well. My question, though, is what is the best way to get the Total and the MAX Total into one table, in terms of performance and clean coding, so you have a result like:
CarID CarName Total Max Total
1872 Olds 202,121 202,121
547841 BMW 175,298 202,121
9877 Ford 10,241 202,121
One approach would be to put the GROUP result into a temp table, and then get the MAX from the temp table into a local variable. But I'm wondering what the best way to do this would be.
UPDATE
The Common Table Expression seems the most elegant to write, yet similar to @EBarr, my limited testing indicates a significantly slower performance. So I won't be going with the CTE.
As the link @EBarr has for the COMPUTE
option indicates the feature
is deprecated, that doesn't seem the best route, either.
The option of a local variable for the MAX value and the use of a temp table will likely be the route I go down, as I'm not aware of performance issues with it.
A bit more detail about my use case: it could probably end up being a series of other SO questions. But suffice to say that I'm loading a large subset of data into a temp table (so a subset of tbl_Cars is going into #tbl_Cars, and even #tbl_Cars may be further filtered and have aggregations performed on it), because I have to perform multiple filtering and aggregation queries on it within a single stored proc that returns multiple result sets.
UPDATE 2
@EBarr's use of a windowed function is nice and short. Note to self:
if using a RIGHT JOIN
to an outer reference table, the COUNT()
function should select a column from tbl_Cars, not '*'
.
SELECT M.MachineID
, M.MachineType
, COUNT(C.CarID) AS Total
, MAX(COUNT(C.CarID)) OVER() as MaxTotal
FROM dbo.tbl_Cars C
RIGHT JOIN dbo.tbl_Machines M
ON C.CarID = M.CarID
GROUP BY M.MachineID
, M.MachineType
In terms of speed, it seems fine, but at what point do you have to be worried about the number of reads?
Mechanically there are a few ways to do this. You could use temp tables/table variable. Another way is with nested queries and/or a CTE as @Aaron_Bertrand showed. A third way is to use WINDOWED FUNCTIONS such as...
SELECT CarName,
COUNT(*) as theCount,
MAX(Count(*)) OVER(PARTITION BY 'foo') as MaxPerGroup
FROM dbo.tbl_Cars
GROUP BY CarName
A DISFAVORED (read depricated) fourth way is using the COMPUTE keyword as such...
SELECT CarID, CarName, Count(*)
FROM dbo.tbl_Cars
GROUP BY CarID, CarName
COMPUTE MAX(Count(*))
The COMPUTE
keyword generates totals that appear as additional summary columns at the end of the result set (see this). In the query above you will actually see two record sets.
Fastest
Now, the next issue is what's the "best/fastest/easiest." I immediately think of an indexed view
. As @Aaron gently reminded me, indexed views have all sorts of restrictions. The above, strategy, however, allows you to create an indexed view on the SELECT...FROM..GROUP BY. Then selecting from the indexed view apply the WINDOWED FUNCTION clause.
Without knowing more, however, about your design it is going to be difficult for anyone tell you what's best. You will get lighting fast queries from an indexed view. That performance comes at a price, though. The price is maintenance costs. If the underlying table is the target of a large amount of insert/update/delete operations the maintenance of the indexed view will bog down performance in other areas.
If you share a bit more about your use case and data access patterns people will be able to share more insight.
MICRO PERFORMANCE TEST
So I generated a little data script and looked at sql profiler numbers for the CTE performance vs windowed functions. This is a micro-test, so try some real numbers in your system under real load.
Data generation:
Create table Cars ( CarID int identity (1,1) primary key,
CarName varchar(20),
value int)
GO
insert into Cars (CarName, value)
values ('Buick', 100),
('Ford', 10),
('Buick', 300),
('Buick', 100),
('Pontiac', 300),
('Bmw', 100),
('Mecedes', 300),
('Chevy', 300),
('Buick', 100),
('Ford', 200);
GO 1000
This script generates 10,000 rows. I then ran each of the four following queries multiple times :
--just group by
select CarName,COUNT(*) countThis
FROM Cars
GROUP BY CarName
--group by with compute (BAD BAD DEVELOPER!)
select CarName,COUNT(*) countThis
FROM Cars
GROUP BY CarName
COMPUTE MAX(Count(*));
-- windowed aggregates...
SELECT CarName,
COUNT(*) as theCount,
MAX(Count(*)) OVER(PARTITION BY 'foo') as MaxInAnyGroup
FROM Cars
GROUP BY CarName
--CTE version
;WITH x AS (
SELECT CarName,
COUNT(*) AS Total
FROM Cars
GROUP BY CarName
)
SELECT x.CarName, x.Total, x2.[Max Total]
FROM x CROSS JOIN (
SELECT [Max Total] = MAX(Total) FROM x
) AS x2;
After running the above queries, I created an indexed view on the "just group by" query above. Then I ran a query on the indexed view that performed a MAX(Count(*)) OVER(PARTITION BY 'foo'
.
AVERAGE RESULTS
Query CPU Reads Duration
--------------------------------------------------------
Group By 15 31 7 ms
Group & Compute 15 31 7 ms
Windowed Functions 14 56 8 ms
Common Table Exp. 16 62 15 ms
Windowed on Indexed View 0 24 0 ms
Obviously this is a micro-benchmark and only mildly instructive, so take it for what it's worth.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With