I want to "left join" a table so that a value is joined not just to a matching row, but also to any subsequent non-matching rows, up to the next matching row. To put it another way, I want to fill in nulls with the previous non-null value. Sample data and desired result: Table <code>x</code>: <pre class="prettyprint"><code> id ---- 1 2 3 4 5 </code></pre> Table <code>y</code>: <pre class="prettyprint"><code> id | val ----+----- 1 | a 4 | b </code></pre> Result of <code>select x.id, y.val from x left join y on x.id=y.id order by x.id;</code>: <pre class="prettyprint"><code> id | val ----+----- 1 | a 2 | 3 | 4 | b 5 | </code></pre> Desired result: <pre class="prettyprint"><code> id | val ----+----- 1 | a 2 | a 3 | a 4 | b 5 | b </code></pre>

<pre class="prettyprint"><code>SELECT m.id, y.val FROM ( SELECT x.id, MAX(y.id) id_y FROM x INNER JOIN y ON x.id >= y.id GROUP BY x.id ) m INNER JOIN y ON m.id_y = y.id ORDER BY m.id </code></pre> Please see fiddle here.

How do I write a join with this unusual matching criteria?

Tags:

sql

join

postgresql

greatest-n-per-group

I want to "left join" a table so that a value is joined not just to a matching row, but also to any subsequent non-matching rows, up to the next matching row. To put it another way, I want to fill in nulls with the previous non-null value.

Sample data and desired result:

Table x:

 id 
----
  1
  2
  3
  4
  5

Table y:

 id | val 
----+-----
  1 | a
  4 | b

Result of select x.id, y.val from x left join y on x.id=y.id order by x.id;:

 id | val 
----+-----
  1 | a
  2 | 
  3 | 
  4 | b
  5 |

Desired result:

 id | val 
----+-----
  1 | a
  2 | a
  3 | a
  4 | b
  5 | b

733

asked Apr 07 '13 10:04

jl6

4 Answers

Indices

Create indices on x.id and y.id - which you probably already have if those are your primary keys.
A multi-column index may help, too, especially with index only scans in pg 9.2+:

CREATE INDEX y_mult_idx ON y (id DESC, val)

However, in my tests, this index was not used at first. Had to add (otherwise pointless) val to ORDER BY to convince the query planner that the sort order matches. See query 3.

The index makes little difference in this synthetic setup. But for tables with more columns, retrieving val from the table becomes increasingly expensive, making the "covering" index more attractive.

Queries

1) Simple

SELECT DISTINCT ON (x.id)
       x.id, y.val
FROM   x
JOIN   y ON y.id <= x.id
ORDER  BY x.id, y.id DESC;

SQL Fiddle.

More explanation for the technique with DISTINCT in this related answer:

Select first row in each GROUP BY group?

I ran some tests because I had my suspicions that the first query wouldn't scale well. It's fast with a small table, but no good with bigger tables. Postgres doesn't optimize the plan and starts with a (limited) cross join, with a cost of O(N²).

2) Fast

This query is still rather simple and scales excellently:

SELECT x.id, y.val
FROM   x
JOIN  (SELECT *, lead(id, 1, 2147483647) OVER (ORDER BY id) AS next_id FROM y) y
       ON  x.id >= y.id
       AND x.id <  y.next_id
ORDER  BY 1;

The window function lead() is instrumental. I make use of the option to provide a default to cover the corner case of the last row: 2147483647 is the biggest possible integer. Adapt to your data type.

3) Very simple and almost as fast

SELECT x.id
     ,(SELECT val FROM y WHERE id <= x.id ORDER BY id DESC, val LIMIT 1) AS val
FROM   x;

Normally, correlated subqueries tend to be slow. But this one can just pick a value from the (covering) index and is otherwise so simple that it can compete.

The additional ORDER BY item val (bold emphasis) seems pointless. But adding it convinces the query planner that it's ok to use the multi-column index y_mult_idx from above, because the sort order matches. Note the

Index Only Scan using y_mult_idx ..

in the EXPLAIN output.

Test case

After a lively debate and multiple updates I collected all queries posted so far and made a test case for a quick overview. I only use 1000 rows so SQLfiddle does not time out with the slower queries. But the top 4 (Erwin 2, Clodoaldo, a_horse, Erwin 3) scale linearly in all my local tests. Updated once more to include my latest addition, improve format and order by performance now:

Big SQL Fiddle comparing performance.

196

answered Sep 30 '22 19:09

Erwin Brandstetter

SQL Fiddle

select
    id, 
    first_value(val) over(partition by g order by id) val
from (
    select
        x.id, val, count(val) over(order by x.id) g
    from
        x
        left join
        y on x.id=y.id
) s
order by id

answered Sep 30 '22 21:09

Clodoaldo Neto

select id,
       first_value(t.val) over (partition by group_flag order by t.id) as val
from (
  select x.id, 
         y.val, 
         sum(case
            when y.val is null then 0
            else 1
          end) over (order by x.id) as group_flag
  from x 
    left join y on x.id=y.id
) t    
order by id;

SQLFiddle example: http://sqlfiddle.com/#!12/38903/1

answered Sep 30 '22 19:09

a_horse_with_no_name

SELECT
  m.id,
  y.val
FROM (
  SELECT
    x.id,
    MAX(y.id) id_y
  FROM
    x INNER JOIN y ON x.id >= y.id
  GROUP BY
    x.id
  ) m INNER JOIN y ON m.id_y = y.id
ORDER BY
  m.id

Please see fiddle here.

answered Sep 30 '22 20:09

fthiella

Related questions
                            
                                Android SQL: Check if Record in Database Exists
                            
                                Get count percent of a record in a single query
                            
                                Oracle find tablename given an index name
                            
                                Good idea to specify schema as part of constraint name?
                            
                                Dropping unique constraint for column in H2
                            
                                showing sum in last row of table using mysql
                            
                                SQL Real vs Float
                            
                                Aggregate function over a given time interval
                            
                                sqlite char, ascii function
                            
                                Select items like records from a column in another table
                            
                                Adding the NOT_NULL constraint to an SQL column
                            
                                SSIS importing datetime column into SQL Server 2008
                            
                                Java - Getting Data from MySQL database
                            
                                How do I stop a MySQL decimal field from being rounded?
                            
                                SQL QUERY using LEFT JOIN and CASE Statement
                            
                                Reseed identity column in SQL Compact
                            
                                How to extract hour for MySQL FROM_UNIXTIME() results?
                            
                                Dynamic SQL taking much much longer than the hard-coded equivalent
                            
                                Return the count of rows grouped by week MySQL
                            
                                Getting SQL Error: ORA-00957: duplicate column name, while creating view

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With