Simplifying (aliasing) T-SQL CASE statements. Any improvement possible?

Tags:

As you can see, this sucks big time. Any alternative? I've tried using the column alias in the group by clause to no avail.

select count(callid) ,
case
        when callDuration > 0 and callDuration < 30 then 1
        when callDuration >= 30 and callDuration < 60 then 2
        when callDuration >= 60 and callDuration < 120 then 3
        when callDuration >= 120 and callDuration < 180 then 4
        when callDuration >= 180 and callDuration < 240 then 5
        when callDuration >= 240 and callDuration < 300 then 6
        when callDuration >= 300 and callDuration < 360 then 7
        when callDuration >= 360 and callDuration < 420 then 8
        when callDuration >= 420 and callDuration < 480 then 9
        when callDuration >= 480 and callDuration < 540 then 10
        when callDuration >= 540 and callDuration < 600 then 11
        when callDuration >= 600 then 12
end as duration
from callmetatbl
where programid = 1001 and callDuration > 0
group by case
        when callDuration > 0 and callDuration < 30 then 1
        when callDuration >= 30 and callDuration < 60 then 2
        when callDuration >= 60 and callDuration < 120 then 3
        when callDuration >= 120 and callDuration < 180 then 4
        when callDuration >= 180 and callDuration < 240 then 5
        when callDuration >= 240 and callDuration < 300 then 6
        when callDuration >= 300 and callDuration < 360 then 7
        when callDuration >= 360 and callDuration < 420 then 8
        when callDuration >= 420 and callDuration < 480 then 9
        when callDuration >= 480 and callDuration < 540 then 10
        when callDuration >= 540 and callDuration < 600 then 11
        when callDuration >= 600 then 12
end

EDIT: I really meant to ask how to have a single case source, but case modifications are welcome anyway (although less useful because the intervals probably will be modified and might even be automatically generated).

As has been considered by some people, callDuration is indeed a float so some listed solutions are not valid for my use case, by leaving values out of the intervals.

Lessons:

Look for patterns in the case expression to reduce it if possible and worthwhile

 case
    when callDuration > 0 AND callDuration < 30 then 1
    when callDuration > 600 then 12
    else floor(callDuration/60) + 2  end
 end as duration

Use inline views to have a single source of the case

select count(d.callid), d.duration
from (   
   select callid
        , case
           when callDuration > 0 AND callDuration < 30 then 1
           when callDuration > 600 then 12
           else floor(callDuration/60) + 2  end
          end as duration
    from callmetatbl
    where programid = 1001
          and callDuration > 0
) d
group by d.duration

Or use common table expressions

   with duration_case as (
      select callid ,
      case
        when callDuration > 0 AND callDuration < 30 then 1
        when callDuration > 600 then 12
        else floor(callDuration/60) + 2  end
      end as duration
   from callmetatbl
   where programid = 1001 and callDuration > 0 )
    select count(callid), duration
    from duration_case
    group by duration

Or use an user defined function (no example so far :-) )

Or use a lookup table and a join

DECLARE @t TABLE(durationFrom float, durationTo float, result INT)
--populate table with values so the query works
select count(callid) , COALESCE(t.result, 12)
from callmetatbl JOIN @t AS t ON callDuration >= t.durationFrom 
AND callDuration < t.durationTo 
where programid = 1001 and callDuration > 0

Thanks to everybody and I'm having a very difficult time choosing an accepted answer, as many covered different parts of the question (and I was there thinking it was a simple question with a straightforward answer :-), sorry for the confusion).

772

asked Jun 04 '09 17:06

Vinko Vrsalovic

1 Answers

Q: how to get an alias to use in the GROUP BY clause

One approach is to use an inline view. [EDIT] The answer from Remus Rusanu (+1!) gives an example of a Common Table Expression to accomplish the same thing. [/EDIT]

The inline view gets you a simple "alias" for the complex expression which you can then reference in a GROUP BY clause in an outer query:

select count(d.callid)
     , d.duration
  from (select callid
             , case
               when callDuration >= 600 then 12
               when callDuration >= 540 then 11
               when callDuration >= 480 then 10
               when callDuration >= 420 then 9
               when callDuration >= 360 then 8
               when callDuration >= 300 then 7
               when callDuration >= 240 then 6
               when callDuration >= 180 then 5
               when callDuration >= 120 then 4
               when callDuration >=  60 then 3
               when callDuration >=  30 then 2
               when callDuration >    0 then 1
               --else null
               end as duration
             from callmetatbl
            where programid = 1001
              and callDuration > 0
       ) d
group by d.duration

Let's unpack that.

the inner (indented) query is called and inline view (we given it an alias d)
in the outer query, we can reference the alias duration from d

That should be sufficient to answer your question. If you're looking for an equivalent replacement expression, the one from tekBlues (+1 !) is the right answer (it works on the boundary and for non-integers.)

With the replacement expression from tekBlues (+1!):

select count(d.callid)
     , d.duration
  from (select callid
             , case 
               when callduration >=30 and callduration<600
                    then floor(callduration/60)+2
               when callduration>0 and callduration< 30
                    then 1 
               when callduration>=600
                    then 12
               end as duration
          from callmetatbl
         where programid = 1001
           and callDuration > 0
       ) d
 group by d.duration

(This should be sufficient to answer your question.)

[UPDATE:] sample user defined function (a replacement for inline CASE expression)

CREATE FUNCTION [dev].[udf_duration](@cd FLOAT)
RETURNS SMALLINT
AS
BEGIN
  DECLARE @bucket SMALLINT
  SET @bucket = 
  CASE
  WHEN @cd >= 600 THEN 12
  WHEN @cd >= 540 THEN 11
  WHEN @cd >= 480 THEN 10
  WHEN @cd >= 420 THEN 9
  WHEN @cd >= 360 THEN 8
  WHEN @cd >= 300 THEN 7
  WHEN @cd >= 240 THEN 6
  WHEN @cd >= 180 THEN 5
  WHEN @cd >= 120 THEN 4
  WHEN @cd >=  60 THEN 3
  WHEN @cd >=  30 THEN 2
  WHEN @cd >    0 THEN 1
  --ELSE NULL
  END
  RETURN @bucket
END

select count(callid)
     , [dev].[udf_duration](callDuration)
  from callmetatbl
 where programid = 1001
   and callDuration > 0
 group by [dev].[udf_duration](callDuration)

NOTES: be aware that the user defined function will add overhead, and (of course) add a dependency on another database object.

This example function is equivalent to the original expression. The OP CASE expression doesn't have any gaps, but it does reference each "breakpoint" twice, I prefer to test only the lower bound. (CASE returns when a condition is satisfied. Doing the tests in reverse lets the unhandled case (<=0 or NULL) fall through without test, an ELSE NULL is not necessary, but could be added for completeness.

ADDITIONAL DETAILS

(Be sure to check the performance and the optimizer plan, to make sure it's the same as (or not significantly worse than) the original. In the past, I've had problems getting predicates pushed into the inline view, doesn't look like it will be a problem in your case.)

stored view

Note that the inline view could also be stored as view definition in the database. But there's no reason to do that, other than to "hide" the complex expression from your statement.

simplifying the complex expression

Another way to make a complex expression "simpler" is to use a user defined function. But a user defined function comes with its own set of issues (including degraded performance.)

add database "lookup" table

Some answers recommend adding a "lookup" table to the database. I don't see that this is really necessary. It could be done of course, and could make sense if you want to be able to derive different values for duration from callDuration, on the fly, without having to modify your query and without having to run any DDL statements (e.g. to alter a view definition, or modify a user defined function).

With a join to a "lookup" table, one benefit is that you could make the query return different result sets by just performing DML operations on the "lookup" table.

But that same advantage may actually be a drawback as well.

Consider carefully if the benefit actually outweighs the downside. Consider the impact that new table will have on unit testing, how to verify the contents of the lookup table are valid and not changed (any overlaps? any gaps?), impact on ongoing maintenance to the code (due to the additional complexity).

some BIG assumptions

A lot of the answers given here seem to assume that callDuration is an INTEGER datatype. It seems they have overlooked the possibility that it's not an integer, but maybe I missed that nugget in the question.

It's fairly simple test case to demonstrate that:

callDuration BETWEEN 0 AND 30

is NOT equivalent to

callDuration > 0 AND callDuration < 30

111

answered Oct 11 '22 09:10

13 revs

Related questions
                            
                                Three table join with joins other than INNER JOIN
                            
                                Insert base 64 string into SQL Server database
                            
                                sql-server: how do i know who is in my database?
                            
                                OLEDE Error: Login failed for user 'DOMAIN\ComputerName$'.;28000
                            
                                Sending null parameters to Sql Server
                            
                                Do numerical primary keys of deleted records in a database get reused for future new records?
                            
                                Set database name dynamically in SQL Server stored procedure?
                            
                                How to use LIKE in a t-sql dynamic statement in a stored procedure?
                            
                                How do I remove line feed characters when selecting data from SQL Server?
                            
                                How to express a range over multiple columns with hierarchic relation?
                            
                                MySQL LIMIT clause equivalent for SQL SERVER
                            
                                How to add datetime field with a time field
                            
                                C# How to implement method that return list of SQL result?
                            
                                jdbc sql error: statement did not return a result set
                            
                                SQL Server 2008 - Set a value when the column is null
                            
                                SQL Server 2008 Insert with WHILE LOOP
                            
                                Correct use of SCOPE_IDENTITY function within simple stored procedure
                            
                                Id or [TableName]Id as primary key / entity identifier
                            
                                How Can I Generate Random Unqiue Numbers in C#
                            
                                Encrypting a BouncyCastle RSA Key Pair and storing in a SQL2008 database

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Simplifying (aliasing) T-SQL CASE statements. Any improvement possible?

Tags:

sql-server

tsql

case

Vinko Vrsalovic

People also ask

1 Answers

13 revs

Recent Activity

Donate For Us