I have the following dataset:
create schema m;
create table m.parent_child_lvl_1(customer_id,date,order_type,order_id,sub_id)
as values
(108384372,'18/09/2023'::date,'sub_parent_first_order',5068371361861,407284605)
,(108384372, '13/11/2023', 'sub_order', 5134167539781, null)
,(108384372, '8/01/2024', 'sub_order', 5214687526981, null)
,(108384372, '4/03/2024', 'sub_order', 5283166126149, null)
,(108384372, '18/06/2024', 'sub_parent_order', 5421811138629, 500649255)
,(108384372, '12/08/2024', 'sub_order', 5508433641541, null)
,(108384372, '12/08/2024', 'sub_order', 5508433641541, null);
I need to fill the null's with the Sub_ID until it hits the next non-null Sub_ID, then fill until the next, and so on.
The null order only works based on the Date ordering.
Also, needs to be grouped by Customer_ID. The full dataset has 15M rows, so any speed optimisation would be a bonus.
I've tried lead(), lag(), lastvalue(), coalesce() but the closest I got was only filling in the first null value after each real value.
The closest I've come to is this:
SELECT
first_value(sub_id) OVER (
PARTITION BY customer_id, value_partition
ORDER BY customer_id, date
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS parent_sub_id,
customer_id,
order_id
FROM (
SELECT
customer_id,
order_id,
sub_id,
date,
SUM(CASE WHEN sub_id IS NULL THEN 0 ELSE 1 END) OVER (
ORDER BY customer_id, date
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS value_partition
from m.parent_child_lvl_1
where customer_id in ('227330109','90872199','102694972')
and
order_type in (
'sub_parent_first_order',
'sub_parent_order',
'sub_order'
)
ORDER BY date ASC
) AS q
ORDER BY customer_id, date
;
Which seems to work for some records, but then goes wrong for others and I can't figure out why.
Edit: Here is a screenshot of the result set, with the column A returning actuals, and column B is what i actually need.

A shorter alternative, using Redshift ignore nulls option (not available in PostgreSQL):
IGNORE NULLS
The function returns the last value in the frame that is notNULL(orNULLif all values areNULL).
select*,last_value(sub_id) ignore nulls
over(partition by customer_id
order by date
rows between unbounded preceding
and current_row) as forward_filled_sub_id
from m.parent_child_lvl_1
where customer_id in ( '108384372','222222222','333333333','444444444'
, '555555555','666666666','777777777','888888888' )
and order_type in ( 'sub_parent_first_order'
, 'sub_parent_order'
, 'sub_order')
order by customer_id
, date
, order_id;
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With