Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to reduce the costs of an sql select with order by?

I'm executing lots of sql selects as follows. Imagine we have a database with flights, where each flight might have an outbound and inbound airport of course, a departure date, a number of stops between origin and destination (on long flights), and of course price.

I now want to select a specific route, and select the one with the lowest count of stops, and of course the bestprice among them.

CREATE TABLE flights(
    id integer
    outbound character varying,
    inbound character varying,
    date timestamp,
    stops integer
    price numeric
);
CREATE INDEX my_idx ON flights (outbound, inbound, date, stops, price);

select * from flights where outbound = 'SFO' and inbound = 'SYD' and date = '2015-10-10' and stops < 2 order by stops asc, price asc.

Problem: the costs using explain-analyze are quite high:

Sort  (cost=9.78..9.79 rows=1 width=129) (actual time=0.055..0.055 rows=4 loops=1)
  Sort Key: stops, price
  Sort Method: quicksort  Memory: 26kB
  ->  Index Scan using my_idx  (cost=0.42..9.77 rows=1 width=129) (actual time=0.039..0.041 rows=4 loops=1)
        Index Cond: ((date = '2015-10-10'::date) AND ((outbound)::text = 'SFO'::text) AND (stops < 2) AND ((inbound)::text = 'SYD'::text))
Total runtime: 0.079 ms

If I just sort by price without stops, the costs are ok (0.42). But sorting by stops somehow increases the costs siginificant.

How can I reduce the costs?

postgresql 9.3.2

like image 530
membersound Avatar asked Mar 17 '23 14:03

membersound


2 Answers

Judging from the given numbers, your alternate query ("If I just sort by price without stops") is actually slower, and you misread the numbers. 0.079 ms vs. 0.42 (?).

That also makes sense, because your first query matches the sort order of the index perfectly.

You already have the perfect index. The suggestion to remove price is unfounded. The additional column removes the cost for the sort step: time=0.055..0.055 as you can see in the plan.

Either way, it should hardly matter at all. As soon as you have reduced the number of rows retrieved to a small number (with predicates on the leading columns of the index), the rest is cheap either way.

To get more interesting results, don't test with stops < 2 (which only leaves 0 and 1 stops), try with a bigger number to see any (probably small) difference.

Actually, since almost all columns are in the index already, I would try and add the one missing column id, too - if you can get index-only scans out of this (Postgres 9.2+, read the Postgres Wiki at the linked page):

CREATE INDEX my_idx ON flights (outbound, inbound, date, stops, price, id);
SELECT id, outbound, inbound, date, stops, price
FROM ...
like image 104
Erwin Brandstetter Avatar answered Mar 20 '23 14:03

Erwin Brandstetter


This is your query:

select * 
from flights
where outbound = 'SFO' and inbound = 'SYD' and date = '2015-10-10' and stops < 2
order by stops asc, price asc.

The optimal index is: flights(outbound, inbound, date, stops). This applies to the where clause. I don't know if there is a way to eliminate the order by, given the where but the sort should not be a big deal unless there are thousands of flights on that day.

like image 23
Gordon Linoff Avatar answered Mar 20 '23 14:03

Gordon Linoff