Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

T-SQL Case statement strange behavior with newid() as randomness source

I'm using SQL Server 2012.

If I do the following to get a list of random-ish numbers in the range [1,3], it works just fine:

SELECT TOP 100 
    ABS(CHECKSUM(NEWID()))%3 + 1 [value_of_rand]
FROM sys.objects

and I get nice things like this (all between 1 and 3).

3
2
2
2
1
....etc.

But if I then put the output of that same chained-random-value function into a CASE statement, it apparently does not produce only the values 1,2,3.

SELECT TOP 100 
    CASE (ABS(CHECKSUM(NEWID()))%3 + 1)
        WHEN 1
            THEN 'one'
        WHEN 2
            THEN 'two'
        WHEN 3
            THEN 'three'
        ELSE
            'that is strange'
    END [value_of_case]
FROM sys.objects

It outputs:

three
that is strange
that is strange
one
two
...etc

What am I doing wrong here?

like image 350
Anssssss Avatar asked Sep 09 '15 16:09

Anssssss


2 Answers

Your:

SELECT TOP 100 
    CASE (ABS(CHECKSUM(NEWID()))%3 + 1)
        WHEN 1
            THEN 'one'
        WHEN 2
            THEN 'two'
        WHEN 3
            THEN 'three'
        ELSE
            'that is strange'
    END [value_of_case]
FROM sys.objects

Actually executed:

SELECT TOP 100 
    CASE 
        WHEN (ABS(CHECKSUM(NEWID()))%3 + 1) = 1  THEN 'one'
        WHEN (ABS(CHECKSUM(NEWID()))%3 + 1) = 2  THEN 'two'
        WHEN (ABS(CHECKSUM(NEWID()))%3 + 1) = 3  THEN 'three'
        ELSE 'that is strange'
    END [value_of_case]
FROM sys.objects;

Basically your expression is non-deterministic and each time is evaluated so you can end up with ELSE clause. So there is no bug or catch, just you use it with variable expression and it is perfectly normal behavior.

This is the same class like COALESCE syntactic-sugar

The COALESCE expression is a syntactic shortcut for the CASE expression. That is, the code COALESCE(expression1,...n) is rewritten by the query optimizer as the following CASE expression:

CASE

WHEN (expression1 IS NOT NULL) THEN expression1

WHEN (expression2 IS NOT NULL) THEN expression2

...

ELSE expressionN

END

This means that the input values (expression1, expression2, expressionN, etc.) will be evaluated multiple times. Also, in compliance with the SQL standard, a value expression that contains a subquery is considered non-deterministic and the subquery is evaluated twice. In either case, different results can be returned between the first evaluation and subsequent evaluations.

EDIT:

Solution: SqlFiddle

SELECT TOP 100 
    CASE t.col
        WHEN 1     THEN 'one'
        WHEN 2     THEN 'two'
        WHEN 3     THEN 'three'
        ELSE      'that is strange'
    END [value_of_case]
FROM sys.objects
CROSS APPLY ( SELECT ABS(CHECKSUM(NEWID()))%3 + 1 ) AS t(col)
like image 111
Lukasz Szozda Avatar answered Oct 06 '22 04:10

Lukasz Szozda


I think that the problem you are experiencing here is that (ABS(CHECKSUM(NEWID()))%3 + 1) isn't a value, it's an expression and SQL has the option of re-evaluating it whenever it wants to. You can try various syntactical things like removing the extra parenthesis or a CTE. That might make it go away (for now), but it might not, since logically it looks like the same request to the optimizer.

I think that the only sure-fire, supported way to stop this would be to save it first (to a temp table or a real one) and then use a second query to reference the saved values.

like image 27
RBarryYoung Avatar answered Oct 06 '22 02:10

RBarryYoung