Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Slow query ordering by a column in a joined table

Introducing an ORDER BY clause in a query increases the total time due the extra work that the db have to do in order to sort the result set:

  • copy the resulting tuples in some temporary memory
  • sorting them (hopefully in memory, otherwise using the disk)
  • stream the result to the client

What I miss is why just adding a column from a joined table produces a so different performance.

Query1

EXPLAIN ANALYZE
SELECT p.*
FROM product_product p
JOIN django_site d ON (p.site_id = d.id)
WHERE (p.active = true  AND p.site_id = 1 )
ORDER BY d.domain, p.ordering, p.name

Query plan

Sort  (cost=3909.83..3952.21 rows=16954 width=1086) (actual time=1120.618..1143.922 rows=16946 loops=1)
   Sort Key: django_site.domain, product_product.ordering, product_product.name
   Sort Method:  quicksort  Memory: 25517kB
   ->  Nested Loop  (cost=0.00..2718.86 rows=16954 width=1086) (actual time=0.053..87.396 rows=16946 loops=1)
         ->  Seq Scan on django_site  (cost=0.00..1.01 rows=1 width=24) (actual time=0.010..0.012 rows=1 loops=1)
               Filter: (id = 1)
         ->  Seq Scan on product_product  (cost=0.00..2548.31 rows=16954 width=1066) (actual time=0.036..44.138 rows=16946 loops=1)
               Filter: (product_product.active AND (product_product.site_id = 1))
 Total runtime: 1182.515 ms

Query 2

Same as the above but not sorting by django_site.domain

Query plan

 Sort  (cost=3909.83..3952.21 rows=16954 width=1066) (actual time=257.094..278.905 rows=16946 loops=1)
   Sort Key: product_product.ordering, product_product.name
   Sort Method:  quicksort  Memory: 25161kB
   ->  Nested Loop  (cost=0.00..2718.86 rows=16954 width=1066) (actual time=0.075..86.120 rows=16946 loops=1)
         ->  Seq Scan on django_site  (cost=0.00..1.01 rows=1 width=4) (actual time=0.015..0.017 rows=1 loops=1)
               Filter: (id = 1)
         ->  Seq Scan on product_product  (cost=0.00..2548.31 rows=16954 width=1066) (actual time=0.052..44.024 rows=16946 loops=1)
               Filter: (product_product.active AND (product_product.site_id = 1))
 Total runtime: 305.392 ms

This question could be related.

Edit: More details added

           Table "public.product_product"
 Column       |          Type          |  
 -------------+------------------------+---------
 id                | integer                | not null default nextval('product_product_id_seq'::regclass)
 site_id           | integer                | not null
 name              | character varying(255) | not null
 slug              | character varying(255) | not null
 sku               | character varying(255) | 
 ordering          | integer                | not null
 [snip some columns ]

 Indexes:
    "product_product_pkey" PRIMARY KEY, btree (id)
    "product_product_site_id_key" UNIQUE, btree (site_id, sku)
    "product_product_site_id_key1" UNIQUE, btree (site_id, slug)
    "product_product_site_id" btree (site_id)
    "product_product_slug" btree (slug)
    "product_product_slug_like" btree (slug varchar_pattern_ops)


                  Table "public.django_site"
 Column |          Type          | 
--------+------------------------+----------
 id     | integer                | not null default nextval('django_site_id_seq'::regclass)
 domain | character varying(100) | not null
 name   | character varying(50)  | not null
Indexes:
    "django_site_pkey" PRIMARY KEY, btree (id)

The Postgres version is 8.4

some table stats:

# select count(*) from django_site;
 count 
-------
     1

# select count(*) from product_product;
 count 
-------
 17540

# select active, count(*) from product_product group by active;
 active | count 
--------+-------
 f      |   591
 t      | 16949

# select site_id, count(*) from product_product group by site_id;
 site_id | count 
---------+-------
       1 | 17540
like image 625
dvd Avatar asked Mar 27 '12 10:03

dvd


1 Answers

Test Case

PostgreSQL 9.1. Test database with limited resources, but way enough for this small case. The locale for collation will be relevant:

SHOW LC_COLLATE;

 de_AT.UTF-8

Step 1) Reconstruct raw test environment

-- DROP TABLE x;
CREATE SCHEMA x;  -- test schema

-- DROP TABLE x.django_site;
CREATE TABLE x.django_site (
id serial primary key
,domain character varying(100) not null
,int_col int not null
);
INSERT INTO x.django_site values (1,'www.testsite.com/foodir/', 3);

-- DROP TABLE x.product;
CREATE TABLE x.product (
 id serial primary key
,site_id integer not null
,name character varying(255) not null
,slug character varying(255) not null
,sku character varying(255) 
,ordering integer not null
,active boolean not null
);

INSERT INTO x.product (site_id, name, slug, sku, ordering, active)
SELECT 1
    ,repeat(chr((random() * 255)::int + 32), (random()*255)::int)
    ,repeat(chr((random() * 255)::int + 32), (random()*255)::int)
    ,repeat(chr((random() * 255)::int + 32), (random()*255)::int)
    ,i -- ordering in sequence
    ,NOT (random()* 0.5174346569119122)::int::bool
FROM generate_series(1, 17540) AS x(i);
-- SELECT ((591::float8 / 17540)* 0.5) / (1 - (591::float8 / 17540))
-- = 0.5174346569119122

CREATE INDEX product_site_id on x.product(site_id);

Step 2) ANALYZE

    ANALYZE x.product;
    ANALYZE x.django_site;

Step 3) Reorder BY random()

-- DROP TABLE x.p;
CREATE TABLE x.p AS
SELECT *
FROM   x.product
ORDER  BY random();

ANALYZE x.p;

Results

EXPLAIN ANALYZE
    SELECT p.*
    FROM   x.p
    JOIN   x.django_site d ON (p.site_id = d.id)
    WHERE  p.active
    AND    p.site_id = 1
--    ORDER  BY d.domain, p.ordering, p.name
--    ORDER  BY p.ordering, p.name
--    ORDER  BY d.id, p.ordering, p.name
--    ORDER  BY d.int_col, p.ordering, p.name
--    ORDER  BY p.name COLLATE "C"
--    ORDER  BY d.domain COLLATE "C", p.ordering, p.name -- dvd's final solution

1) Pre ANALYZE (-> bitmap index scan)
2) Post ANALYZE (-> seq scan)
3) Re-order by random(), ANALYZE

ORDER  BY d.domain, p.ordering, p.name

1) Total runtime: 1253.543 ms
2) Total runtime: 1250.351 ms
3) Total runtime: 1283.111 ms

ORDER  BY p.ordering, p.name

1) Total runtime: 177.266 ms
2) Total runtime: 174.556 ms
3) Total runtime: 177.797 ms

ORDER  BY d.id, p.ordering, p.name

1) Total runtime: 176.628 ms
2) Total runtime: 176.811 ms
3) Total runtime: 178.150 ms
The planner obviously factors in that d.id is functionally dependent.

ORDER  BY d.int_col, p.ordering, p.name -- integer column in other table

1) Total runtime: 242.218 ms -- !!
2) Total runtime: 245.234 ms
3) Total runtime: 254.581 ms
The planner obviously misses that d.int_col (NOT NULL) is just as functionally dependent. But sorting by an integer column is cheap.

ORDER  BY p.name -- varchar(255) in same table

1) Total runtime: 2259.171 ms -- !!
2) Total runtime: 2257.650 ms
3) Total runtime: 2258.282 ms
Sorting by a (long) varchar or text column is expensive ...

ORDER  BY p.name COLLATE "C"

1) Total runtime: 327.516 ms -- !!
2) Total runtime: 325.103 ms
3) Total runtime: 327.206 ms
... but not as expensive if done without locale.

With the locale out of the way, sorting by a varchar column is not quite but almost as fast. Locale "C" is effectively "no locale, just order by byte value". I quote the manual:

If you want the system to behave as if it had no locale support, use the special locale name C, or equivalently POSIX.


Putting it all together, @dvd chose:

ORDER  BY d.domain COLLATE "C", p.ordering, p.name

... 3) Total runtime: 275.854 ms
That should do.

like image 84
Erwin Brandstetter Avatar answered Sep 28 '22 03:09

Erwin Brandstetter