Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to efficiently search for last record matching a condition in Rails and PostgreSQL?

Suppose you want to find the last record entered into the database (highest ID) matching a string: Model.where(:name => 'Joe'). There are 100,000+ records. There are many matches (say thousands).

What is the most efficient way to do this? Does PostgreSQL need to find all the records, or can it just find the last one? Is this a particularly slow query?

Working in Rails 3.0.7, Ruby 1.9.2 and PostgreSQL 8.3.

like image 632
B Seven Avatar asked Nov 21 '11 18:11

B Seven


1 Answers

The important part here is to have a matching index. You can try this small test setup:

Create schema xfor testing:

-- DROP SCHEMA x CASCADE;  -- to wipe it all for a retest or when done.
CREATE SCHEMA x;
CREATE TABLE x.tbl(id serial, name text);

Insert 10000 random rows:

INSERT INTO x.tbl(name) SELECT 'x' || generate_series(1,10000);

Insert another 10000 rows with repeating names:

INSERT INTO x.tbl(name) SELECT 'y' || generate_series(1,10000)%20;

Delete random 10% to make it more real life:

DELETE FROM x.tbl WHERE random() < 0.1;

ANALYZE x.tbl;

Query can look like this:

SELECT *
FROM   x.tbl
WHERE  name = 'y17'
ORDER  BY id DESC
LIMIT  1;

--> Total runtime: 5.535 ms

CREATE INDEX tbl_name_idx on x.tbl(name);

--> Total runtime: 1.228 ms

DROP INDEX x.tbl_name_idx;
CREATE INDEX tbl_name_id_idx on x.tbl(name, id);

--> Total runtime: 0.053 ms

DROP INDEX x.tbl_name_id_idx;
CREATE INDEX tbl_name_id_idx on x.tbl(name, id DESC);

--> Total runtime: 0.048 ms

DROP INDEX x.tbl_name_id_idx;
CREATE INDEX tbl_name_idx on x.tbl(name);
CLUSTER x.tbl using tbl_name_idx;

--> Total runtime: 1.144 ms

DROP INDEX x.tbl_name_id_idx;
CREATE INDEX tbl_name_id_idx on x.tbl(name, id DESC);
CLUSTER x.tbl using tbl_name_id_idx;

--> Total runtime: 0.047 ms

Conclusion

With a fitting index, the query performs more than 100x faster.
Top performer is a multicolumn index with the filter column first and the sort column last.
Matching sort order in the index helps a little in this case.

Clustering helps with the simple index, because still many columns have to be read from the table, and these can be found in adjacent blocks after clustering. It doesn't help with the multicolumn index in this case, because only one record has to be fetched from the table.
Read more about multicolumn indexes in the manual.

All of these effects grow with the size of the table. 10000 rows of two tiny columns is just a very small test case.

like image 84
Erwin Brandstetter Avatar answered Sep 27 '22 20:09

Erwin Brandstetter