Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to make LAG() ignore NULLS in SQL Server?

Does anyone know how to replace nulls in a column with a string until it hits a new string then that string replaces all null values below it? I have a column that looks like this

Original Column:

PAST_DUE_COL           
91 or more days pastdue        
Null
Null
61-90 days past due          
Null
Null
31-60 days past due
Null
0-30 days past due
Null       
Null
Null            

Expected Result Column:

PAST_DUE_COL           
91 or more days past due        
91 or more days past due
91 or more days past due
61-90 days past due          
61-90 days past due 
61-90 days past due 
31-60 days past due
31-60 days past due
0-30 days past due
0-30 days past due      
0-30 days past due
0-30 days past due

Essentially I want the first string in the column to replace all null values below it until the next string. Then that string will replace all nulls below it until the next string and so on.

like image 203
Ryan Avatar asked Feb 07 '20 00:02

Ryan


People also ask

How do I ignore NULL values in SQL Server?

To exclude the null values from the table we need to use IS NOT NULL operator with the WHERE clause. WHERE Clause: The WHERE clause is used to filter the records. It will extract those records that fulfill the condition.

Does SQL count ignore NULLs?

COUNT(expression) returns the number of values in expression, which is a table column name or an expression that evaluates to a column of data. COUNT(expression) does not count NULL values.

How do you avoid NULLs?

One way of avoiding returning null is using the Null Object pattern. Basically you return a special case object that implements the expected interface. Instead of returning null you can implement some kind of default behavior for the object. Returning a null object can be considered as returning a neutral value.

Does First_value ignore NULLs?

If the first value in the set is null, then the function returns NULL unless you specify IGNORE NULLS . This setting is useful for data densification. If you specify IGNORE NULLS , then FIRST_VALUE returns the fist non-null value in the set, or NULL if all values are null.


2 Answers

SQL Server does not support the ignore nulls option for window functions such as lead() and lag(), for which this question was a nice fit.

We can work around this with some gaps and island technique:

select
    t.*,
    max(past_due_col) over(partition by grp) new_past_due_col
from (
    select 
        t.*,
        sum(case when past_due_col is null then 0 else 1 end)
            over(order by id) grp
    from mytable t
) t

The subquery does a window sum that increments everytime a non null value is found: this defines groups of rows that contain a non-null value followed by null values.

Then, the outer uses a window max() to retrieve the (only) non-null value in each group.

This assumes that a column can be used to order the records (I called it id).

Demo on DB Fiddle:

ID | PAST_DUE_COL            | grp | new_past_due_col       
-: | :---------------------- | --: | :----------------------
 1 | 91 or more days pastdue |   1 | 91 or more days pastdue
 2 | null                    |   1 | 91 or more days pastdue
 3 | null                    |   1 | 91 or more days pastdue
 4 | 61-90 days past due     |   2 | 61-90 days past due    
 5 | null                    |   2 | 61-90 days past due    
 6 | null                    |   2 | 61-90 days past due    
 7 | 31-60 days past due     |   3 | 31-60 days past due    
 8 | null                    |   3 | 31-60 days past due    
 9 | 0-30 days past due      |   4 | 0-30 days past due     
10 | null                    |   4 | 0-30 days past due     
11 | null                    |   4 | 0-30 days past due     
12 | null                    |   4 | 0-30 days past due     
like image 61
GMB Avatar answered Oct 13 '22 10:10

GMB


This is a variation on GMBs answer. It is just a bit simpler:

select t.*,
       max(past_due_col) over(partition by grp) as new_past_due_col
from (select t.*,
             count(past_due_col) over (order by id) as grp
      from mytable t
     ) t;

Note that you need an ordering column of some sort for your question to even make sense.

Another approach uses apply:

select t.*, t2.past_due_col
from mytable t outer apply
     (select top (1) t2.*
      from mytable t2
      where t2.id <= t.id and t2.past_due_col is not null
      order by t2.id desc
     ) t2;
like image 44
Gordon Linoff Avatar answered Oct 13 '22 10:10

Gordon Linoff