I'm executing lots of sql selects as follows. Imagine we have a database with flights, where each flight might have an outbound and inbound airport of course, a departure date, a number of stops between origin and destination (on long flights), and of course price.
I now want to select a specific route, and select the one with the lowest count of stops, and of course the bestprice among them.
CREATE TABLE flights(
id integer
outbound character varying,
inbound character varying,
date timestamp,
stops integer
price numeric
);
CREATE INDEX my_idx ON flights (outbound, inbound, date, stops, price);
select * from flights where outbound = 'SFO' and inbound = 'SYD' and date = '2015-10-10' and stops < 2 order by stops asc, price asc.
Problem: the costs using explain-analyze
are quite high:
Sort (cost=9.78..9.79 rows=1 width=129) (actual time=0.055..0.055 rows=4 loops=1)
Sort Key: stops, price
Sort Method: quicksort Memory: 26kB
-> Index Scan using my_idx (cost=0.42..9.77 rows=1 width=129) (actual time=0.039..0.041 rows=4 loops=1)
Index Cond: ((date = '2015-10-10'::date) AND ((outbound)::text = 'SFO'::text) AND (stops < 2) AND ((inbound)::text = 'SYD'::text))
Total runtime: 0.079 ms
If I just sort by price without stops, the costs are ok (0.42). But sorting by stops somehow increases the costs siginificant.
How can I reduce the costs?
postgresql 9.3.2
Judging from the given numbers, your alternate query ("If I just sort by price without stops") is actually slower, and you misread the numbers. 0.079 ms
vs. 0.42
(?).
That also makes sense, because your first query matches the sort order of the index perfectly.
You already have the perfect index. The suggestion to remove price
is unfounded. The additional column removes the cost for the sort step: time=0.055..0.055
as you can see in the plan.
Either way, it should hardly matter at all. As soon as you have reduced the number of rows retrieved to a small number (with predicates on the leading columns of the index), the rest is cheap either way.
To get more interesting results, don't test with stops < 2
(which only leaves 0 and 1 stops), try with a bigger number to see any (probably small) difference.
Actually, since almost all columns are in the index already, I would try and add the one missing column id
, too - if you can get index-only scans out of this (Postgres 9.2+, read the Postgres Wiki at the linked page):
CREATE INDEX my_idx ON flights (outbound, inbound, date, stops, price, id);
SELECT id, outbound, inbound, date, stops, price
FROM ...
This is your query:
select *
from flights
where outbound = 'SFO' and inbound = 'SYD' and date = '2015-10-10' and stops < 2
order by stops asc, price asc.
The optimal index is: flights(outbound, inbound, date, stops)
. This applies to the where
clause. I don't know if there is a way to eliminate the order by
, given the where
but the sort should not be a big deal unless there are thousands of flights on that day.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With