Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Algorithm improvement on a simple looking postgresql query

High-level: Can I do this order by, group by based on sum any faster? (PG 8.4, fwiw., on a non-tiny table .... think O(millions of rows) )

Suppose I had a table like this:

                                 Table "public.summary"
   Column    |       Type        |                      Modifiers
-------------+-------------------+------------------------------------------------------
 ts          | integer           | not null default nextval('summary_ts_seq'::regclass)
 field1      | character varying | not null
 otherfield  | character varying | not null
 country     | character varying | not null
 lookups     | integer           | not null


Indexes:
    "summary_pk" PRIMARY KEY, btree (ts, field1, otherfield, country)
    "ix_summary_country" btree (country)
    "ix_summary_field1" btree (field1)
    "ix_summary_otherfield" btree (otherfield)
    "ix_summary_ts" btree (ts)

And the query I want is:

select summary.field1,
    summary.country,
    summary.ts,
    sum(summary.lookups) as lookups,
from summary
where summary.country = 'za' and
    summary.ts = 1275177600
group by summary.field1, summary.country, summary.ts
order by summary.ts, lookups desc, summary.field1
limit 100;

(English: top 100 field1's at a particular (ts,country) where 'topness' is the sum of lookups for any matching row, regardless of value of otherfield)

Is there anything I can really do to speed this up? Algorithmically this seems to be a full table scan kind of thing, but I might be missing something.

like image 421
Gregg Lind Avatar asked Oct 14 '22 03:10

Gregg Lind


1 Answers

Any query plan for this query will have to scan every row that matches the WHERE conditions, rolling them up by the grouping conditions - that is, the amount of work is proportional to the number of input rows to the group by, not the number of result rows.

The most efficient query plan possible for a query like this is a single index scan. This ought to be possible if you build an index on (country, ts) in that order; with that index, every possible query of this form resolves to a contiguous range over the index. This will still require an in-memory sort, though - it may be possible to avoid this with a different index.

As others have said, though, posting an execution plan is your best option.

like image 136
Nick Johnson Avatar answered Oct 20 '22 10:10

Nick Johnson