Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Efficient latest record query with Postgresql

I need to do a big query, but I only want the latest records.

For a single entry I would probably do something like

SELECT * FROM table WHERE id = ? ORDER BY date DESC LIMIT 1; 

But I need to pull the latest records for a large (thousands of entries) number of records, but only the latest entry.

Here's what I have. It's not very efficient. I was wondering if there's a better way.

SELECT * FROM table a WHERE ID IN $LIST AND date = (SELECT max(date) FROM table b WHERE b.id = a.id); 
like image 500
Sheldon Ross Avatar asked Nov 05 '09 22:11

Sheldon Ross


People also ask

How do I get the latest record in PostgreSQL?

Instructive way: MAXSELECT timestamp, value, card FROM my_table WHERE timestamp = ( SELECT MAX(timestamp) FROM my_table ); But without an index, two passes on the data will be necessary whereas the previous query can find the solution with only one scan.


2 Answers

If you don't want to change your data model, you can use DISTINCT ON to fetch the newest record from table "b" for each entry in "a":

SELECT DISTINCT ON (a.id) * FROM a INNER JOIN b ON a.id=b.id ORDER BY a.id, b.date DESC 

If you want to avoid a "sort" in the query, adding an index like this might help you, but I am not sure:

CREATE INDEX b_id_date ON b (id, date DESC)  SELECT DISTINCT ON (b.id) * FROM a INNER JOIN b ON a.id=b.id ORDER BY b.id, b.date DESC 

Alternatively, if you want to sort records from table "a" some way:

SELECT DISTINCT ON (sort_column, a.id) * FROM a INNER JOIN b ON a.id=b.id ORDER BY sort_column, a.id, b.date DESC 

Alternative approaches

However, all of the above queries still need to read all referenced rows from table "b", so if you have lots of data, it might still just be too slow.

You could create a new table, which only holds the newest "b" record for each a.id -- or even move those columns into the "a" table itself.

like image 119
intgr Avatar answered Oct 13 '22 23:10

intgr


this could be more eficient. Difference: query for table b is executed only 1 time, your correlated subquery is executed for every row:

SELECT *  FROM table a  JOIN (SELECT ID, max(date) maxDate         FROM table       GROUP BY ID) b ON a.ID = b.ID AND a.date = b.maxDate WHERE ID IN $LIST  
like image 24
manji Avatar answered Oct 13 '22 21:10

manji