Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Postgres: left join with order by and limit 1

I have the situation:

Table1 has a list of companies.
Table2 has a list of addresses.
Table3 is a N relationship of Table1 and Table2, with fields 'begin' and 'end'.

Because companies may move over time, a LEFT JOIN among them results in multiple records for each company.

begin and end fields are never NULL. The solution to find the latest address is use a ORDER BY being DESC, and to remove older addresses is a LIMIT 1.

That works fine if the query can bring only 1 company. But I need a query that brings all Table1 records, joined with their current Table2 addresses. Therefore, the removal of outdated data must be done (AFAIK) in LEFT JOIN's ON clause.

Any idea how I can build the clause to not create duplicated Table1 companies and bring latest address?

like image 397
Hikari Avatar asked Feb 17 '14 19:02

Hikari


3 Answers

Use a dependent subquery with max() function in a join condition.
Something like in this example:

SELECT *
FROM companies c
LEFT JOIN relationship r
ON c.company_id = r.company_id
   AND r."begin" = (
        SELECT max("begin")
        FROM relationship r1
        WHERE c.company_id = r1.company_id
     )
INNER JOIN addresses a
ON a.address_id = r.address_id 

demo: http://sqlfiddle.com/#!15/f80c6/2

like image 185
krokodilko Avatar answered Oct 18 '22 17:10

krokodilko


Since PostgreSQL 9.3 there is JOIN LATERAL (https://www.postgresql.org/docs/9.4/queries-table-expressions.html) that allows to make a sub-query to join, so it solves your issue in an elegant way:

SELECT * FROM companies c
JOIN LATERAL (
    SELECT * FROM relationship r
    WHERE c.company_id = r.company_id
    ORDER BY r."begin" DESC LIMIT 1
) r ON TRUE
JOIN addresses a ON a.address_id = r.address_id

The disadvantage of this approach is the indexes of the tables inside LATERAL do not work outside.

like image 15
Fomalhaut Avatar answered Oct 18 '22 19:10

Fomalhaut


I managed to solve it using Windows Function:

WITH ranked_relationship AS(
    SELECT
        *
        ,row_number() OVER (PARTITION BY fk_company ORDER BY dt_start DESC) as dt_last_addr
    FROM relationship
)

SELECT
    company.*
    address.*,
    dt_last_addr as dt_relationship
FROM
    company
    LEFT JOIN ranked_relationship as relationship
            ON relationship.fk_company = company.pk_company AND dt_last_addr = 1
    LEFT JOIN address ON address.pk_address = relationship.fk_address

row_number() creates an int counter for each record, inside each window based to fk_company. For each window, the record with latest date comes first with rank 1, then dt_last_addr = 1 makes sure the JOIN happens only once for each fk_company, with the record with latest address.

Window Functions are very powerful and few ppl use them, they avoid many complex joins and subqueries!

like image 14
Hikari Avatar answered Oct 18 '22 18:10

Hikari