Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

PostgreSQL (AWS Redshift) Forward-fill null values based on previous non-null value in the same column

I have the following dataset:

create schema m;
create table m.parent_child_lvl_1(customer_id,date,order_type,order_id,sub_id)
as values
 (108384372,'18/09/2023'::date,'sub_parent_first_order',5068371361861,407284605)
,(108384372, '13/11/2023', 'sub_order', 5134167539781, null)
,(108384372, '8/01/2024', 'sub_order', 5214687526981, null)
,(108384372, '4/03/2024', 'sub_order', 5283166126149, null)
,(108384372, '18/06/2024', 'sub_parent_order', 5421811138629, 500649255)
,(108384372, '12/08/2024', 'sub_order', 5508433641541, null)
,(108384372, '12/08/2024', 'sub_order', 5508433641541, null);

I need to fill the null's with the Sub_ID until it hits the next non-null Sub_ID, then fill until the next, and so on. The null order only works based on the Date ordering. Also, needs to be grouped by Customer_ID. The full dataset has 15M rows, so any speed optimisation would be a bonus.

I've tried lead(), lag(), lastvalue(), coalesce() but the closest I got was only filling in the first null value after each real value.

The closest I've come to is this:

SELECT
  first_value(sub_id) OVER (
    PARTITION BY customer_id, value_partition 
    ORDER BY customer_id, date 
    ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
  ) AS parent_sub_id,
  customer_id,
  order_id
FROM (
  SELECT
customer_id,
order_id,
sub_id,
date,
    SUM(CASE WHEN sub_id IS NULL THEN 0 ELSE 1 END) OVER (
      ORDER BY customer_id, date 
      ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
    ) AS value_partition
from m.parent_child_lvl_1 
where customer_id in ('227330109','90872199','102694972')
and
order_type in (
'sub_parent_first_order',
'sub_parent_order',
'sub_order'
)
  ORDER BY date ASC
) AS q
ORDER BY customer_id, date
 ;

Which seems to work for some records, but then goes wrong for others and I can't figure out why.

Edit: Here is a screenshot of the result set, with the column A returning actuals, and column B is what i actually need.

enter image description here

like image 926
Maz Avatar asked Nov 20 '25 19:11

Maz


1 Answers

A shorter alternative, using Redshift ignore nulls option (not available in PostgreSQL):

IGNORE NULLS
The function returns the last value in the frame that is not NULL (or NULL if all values are NULL).

select*,last_value(sub_id) ignore nulls
          over(partition by customer_id
               order by date
               rows between unbounded preceding 
                        and current_row) as forward_filled_sub_id
from m.parent_child_lvl_1 
where customer_id in (  '108384372','222222222','333333333','444444444'
                      , '555555555','666666666','777777777','888888888' )
  and order_type in (  'sub_parent_first_order'
                     , 'sub_parent_order'
                     , 'sub_order')
order by customer_id
       , date
       , order_id;
like image 71
Zegarek Avatar answered Nov 22 '25 09:11

Zegarek



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!