Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Randomly generated values in CASE statements return NULL

According to this post, the correct way of generating a random value in SQL Server is:

ABS(CHECKSUM(NewId())) % 14   -- Returns a value between 0 and 13

However, when using this expression in a case statement, such as:

SELECT
    CASE ABS(CHECKSUM(NEWID())) % 4 
        WHEN 0 THEN 'String A' 
        WHEN 1 THEN 'String B' 
        WHEN 2 THEN 'String C' 
        WHEN 3 THEN 'String D'
    END AS RandomString -- Returns String A, B, C, D and NULLs.
FROM sys.all_objects

The outputtet RandomString column contains some NULLs, as can be seen in this SQL fiddle. I have found that I can wrap the randomizing expression in a CTE to avoid NULLs in the output, but I still wonder why the code above returns NULLs?

WITH RandomNumber AS (
    SELECT ABS(CHECKSUM(NEWID())) % 4 AS N FROM sys.all_objects
)
SELECT TOP 100
    CASE N
        WHEN 0 THEN 'String A' 
        WHEN 1 THEN 'String B' 
        WHEN 2 THEN 'String C' 
        WHEN 3 THEN 'String D'
    END AS RandomString -- Does not return any NULLs. Only String A, B, C and D.
FROM RandomNumber

I have tried to generate the random number using a slightly different method, but the result is the same:

CAST(RAND(CHECKSUM(NEWID())) * 4 AS INT)  -- Returns a value between 0 and 3

This seems to be a problem on SQL Server 2014, I have not tested it on other versions.

like image 489
Dan Avatar asked May 05 '15 13:05

Dan


People also ask

How do you handle NULL values in a case statement?

SQL offers two case abbreviations to cope with null : coalesce and nullif . Both are used like functions and do not use the keywords case , when , then , else and end .

How do you return a NULL in a SQL CASE statement?

If no value/condition is found to be TRUE, then the CASE statement will return the value in the ELSE clause. If the ELSE clause is omitted and no condition is found to be true, then the CASE statement will return NULL.

Does Min return NULL?

When does the MIN function return a NULL value? We're glad that you asked. The MIN function returns a NULL value if and only if there are only NULL values in the set.


2 Answers

It generates NULLs because there's no guarantee on how many times a particular expression will be evaluated.

What you wanted was for SQL Server to do something along the lines of:

let x = GenerateRandomNumber()
if x = 1 then 'String 1'
if x = 2 then 'String 2'
if x = 3 then 'String 3'
if x = 4 then 'String 4'

(Where GenerateRandomNumber() is ABS(CHECKSUM(NEWID())) % 4); but what SQL Server actually did is:

if GenerateRandomNumber() = 1 then 'String 1'
if GenerateRandomNumber() = 2 then 'String 2'
if GenerateRandomNumber() = 3 then 'String 3'
if GenerateRandomNumber() = 4 then 'String 4'

So you only get a non-NULL result if you happen to pick the right random number for one particular comparison operation.

I think that even with the CTE, there's no guarantee that SQL Server won't ever generate something like the second code block above. If you want a stable, generated once random number, you need to arrange to store that value somewhere (e.g. in a table variable or temp table).


The reason why I focus on guarantees is that you don't want to end up writing code based on current observed behaviour. There were a lot of "issues" reported when SQL Server 2008 stopped "ordering" results in views that we're using the TOP 100 PERCENT ... ORDER BY trick - something that happened to work (mostly) on 2005 and earlier versions but ceased to do so.

Similarly, if someone asked my for an expression that returns the number 5, I could supply them the expression DATEPART(day,GETUTCDATE()) and let them run as many queryies over as many rows as they like - for the next 8 hours - but that doesn't mean that I'd recommend it as a solution to their problem.

And also, we know that there are areas where SQL Server's decisions about evaluation order can be surprising.

like image 88
Damien_The_Unbeliever Avatar answered Oct 09 '22 06:10

Damien_The_Unbeliever


The expression is being rewritten internally to the same logic as

CASE WHEN ABS(CHECKSUM(NEWID())) % 4 = 1 THEN x 
     WHEN ABS(CHECKSUM(NEWID())) % 4 = 2 THEN y 
     WHEN ...

This is why the syntax in the question will never work. The expression is being evaluated for each WHEN.

The way you can fix it is:

SELECT 
    CASE x % 4 
        WHEN 0 THEN 'String A' 
        WHEN 1 THEN 'String B' 
        WHEN 2 THEN 'String C' 
        WHEN 3 THEN 'String D'
    END AS RandomString -- Returns String A, B, C, D and NULLs.
FROM sys.all_objects
CROSS APPLY (SELECT ABS(CHECKSUM(NEWID())) x) y

Using CROSS APPLY will only calculate it once

like image 25
t-clausen.dk Avatar answered Oct 09 '22 06:10

t-clausen.dk