In my MS SQL Server
database I am pulling transaction data based on a variety of different codes that are in one column.
Would it be more efficient to:
join the same table over and over for each code in a WHERE clause
do multiple case statements on the whole table (shown below)
do multiple case statements on the whole table but limit it by a WHERE SubsidCde IN ('AA','BA','BB', etc)
clause
We have so many queries running per second that even though I have tried all 3 methods I get no definitive results.
SELECT
SUM(CASE WHEN Subsid_Cde = 'AA' THEN Trans_Amt END),0) [AA],
SUM(CASE WHEN Subsid_Cde = 'BA' THEN Trans_Amt END),0) [BA],
SUM(CASE WHEN Subsid_Cde = 'BB' THEN Trans_Amt END),0) [BB]
FROM
Transactions
-- There are 8 more rows like this, using a different code for each line
If you're summing all possible (or most) values of Subsid_Cde field, then CASE is faster as it won't scan the table multiple times as it aggregates the sums. If you only looking for a small subset of possible Subsid_Cde fields then separate selects / joins (along with an index on Subsid_Cde) will work faster.
You need to learn to read Execution Plans, then you'll be able to figure such things by yourself.
Also, alternatively, you could do a GROUP BY on Subsid_Cde wrapped into a PIVOT clause (google for PIVOT MS SQL SERVER 2005)
3 is your best bet. It's easy to read, it's easy to modify later on, and it should use the indexes that you've defined and expect to be using (still, check).
--1 Sometimes you have to join to the same table. But this isn't one of them and joining every time you need to include a new Subsid_Cde makes for less readable SQL without really gaining anything.
--2 Transaction tables tend to grow very large, so you NEVER want to scan the entire table. So #2 is definitely out, unless the codes you'll be using in your query gives you back all of the rows anyway.
Use this:
SELECT (
SELECT SUM(Trans_Amt)
FROM Transactions
WHERE Subsid_Cde = 'AA'
) AS sum_aa,
(
SELECT SUM(Trans_Amt)
FROM Transactions
WHERE Subsid_Cde = 'BB'
) AS sum_bb
, without external FROM
or WHERE
clause.
In SQL Server 2005+
, use this:
SELECT [AA], [BB]
FROM (
SELECT trans_amt, subsid_cde
FROM transactions
) q
PIVOT (
SUM(trans_amt)
FOR subsid_cde IN (['AA'], ['BB'])
)
There are some very important considerations when you perform a task like this.
CASE
CASE
CASE
is an expression and not a statement.
The code is going to revaluate 'everything relevant' for each WHEN
.
In your instance of CASE
, it's looking against a raw table and doesn't evaluate beyond one WHEN
, so it isn't going to re-query, it's going to do one scan or seek depending on your index and return a result, and perform that for each CASE
but it would have to perform that operation for the JOIN
, the fact that your result is coded into the case means that the system isn't having to search for the value to check.
That last point is huge because it has some pretty major implications for both the business and you as the coder.
There is a really great article that Aaron Bertrand wrote that goes into detail about "The Dirty Secrets of Case" https://sqlperformance.com/2014/06/t-sql-queries/dirty-secrets-of-the-case-expression
The summary is, that it could potentially re-query for each WHEN
, which doesn't sound too bad, but if you have a SQL query that suffers from growth factor poorly, your CASE
solution won't be scalable and if you use ANY FUNCTION that re-evaluates at invocation [RAND()
being a culprit] it is going to produce not only inconsistent results in some cases, but could have a huge impact hit on your output if your query has a 'Big O notation that involves powers of N' (which is a very technical way of saying, uses a Sort, Union, Distinct or other forms of passive sort invocation or worse, is engaging an outdated plan without up-to-date statistics. This means CASE
could potentially double, triple or as I like to call it '(N)WHEN' the query time of a monster that already is suffering from an exponential iteration problem.
Your
CASE
is going to require 'TECHNICAL MAINTENANCE' (Whenever a new code is implemented, YOU are going to have to recode to factor that in), aJOIN
is going to require 'USER MAINTENANCE' (User alters the lookup table, and your query is updated without you needing to take any ownership)
'TECHNICAL MAINTENANCE' puts YOU at RISK of becoming entangled in the consequences of business mistakes.
You always want to code your solutions to minimise your ownership of the data-throughput because you don't want to be in a court talking about your little slice of involvement as you touched business logic, you want to be at home with your family or on vacation without getting phone calls about 'CC' they just added and want Summing in your report.
In the instance of CASE
, you have coded yourself into a situation where you are performing business logic that the business should be deciding in a look up table that SHOULD be a JOIN
.
In short, when the business comes up with a new code, your report is not seeing that output, which is going to cause you production draw, especially if you go on holiday and find yourself being the only person who knows about the existence of your CASE
, you might not see it as much of an overhead, but you're not thinking long term, because memory is hazy in the long run and you may find yourself searching for your CASE
and then adding to it in the future, which you won't thank yourself for.
To save 4 seconds, you're personally inconveniencing yourself.
Sacrificing yourself at the alter of speed is not good coding practice.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With